winterkoninkje: shadowcrane (clean) (Default)

So I've been messing around with my profile the last couple days. For the most part I tend to let it get rather stale, but every so often I go in to shuffle the interests around a bit, curse the 150 limit, see who's friended me, etc. And lately I've started taking part in an aspect of lj I've largely ignored until now.

One of the interesting things about lj is the social aspect to it. Not just the comments and the webwork of friending but also the communities, the interest searching, the schools, u.s.w. Lately I've been joining a number of communities and meeting new folks, mostly other Portlanders. Much as I prefer a number of the features of my own blogging software, it was never my intention to add that social aspect. I designed it to ease writing posts, in particular posts like mine which tend to have footnotes, references, and the like. I do have plans to add in comments, but that's about it.

If I were to write something for socializing, I'd probably write an engine for sophisticated interests tagging; remove the blogging entirely or have it just be a hook into your blog site of choice. The more I think about how I'd design such an interests machine, the more it starts to resemble certain other projects of mine. The first is a tagging engine for keeping track of large quantities of media files (this is anime, that's a photo from [livejournal.com profile] urban_decay, this is a Don Hertzfeldt short, that's pr0n,...). The second is to deal with one of my biggest gripes about iTunes: namely to provide a sophisticated system for categorizing genres, e.g. sometimes you want to be specific (ebm, darkwave, japanese swing, koto,...) and sometimes you just want to be general (electronica, japanese,... or heck: music).

What all three projects have in common is they're semantic-space tagging problems. The linguist in me leaps at the curiosity of breaking into these spaces as the hacker in me leaps at the challenge of designing a general enough ontology for being able to describe these sorts of categorizations. Of course, for all of these problems you wouldn't want to just have a file somewhere that defines the relationships, but rather such definitions should be able to be dynamically discovered by use. If a new subgenre of industrial called fooz shows up, I don't want to have to add it to some list somewhere, I should just be able to start tagging my songs as fooz. Certainly the relationship between fooz and industrial would need to be defined somewhere (say, by tagging something as industrial.fooz), but once defined we should be able to tag things as fooz without needing to restate all of the hierarchy of what it's related to.

There are a number of complications too. Subcategories aren't always proper subsets of some single more general term. Take futurepop for example. One could list it as a subgenre of industrial, but one could also list it as a subgenre of trance. So one should be able to refer to both industrial.futurepop and trance.futurepop and know that they are the same thing: futurepop. But sometimes subcategories with the same name are different and should not be joined. A contrived example but, say, food.java vs computers.java vs countries.java. Another complication — though trivially dealt with — is that some things may belong to multiple groups even though those groups aren't related, e.g. the Seatbelts are both swing music and japanese music, but that doesn't mean swing is japanese or that japanese is swing.

It'll be a long while before I'd have a chance to work on such an engine, interesting as it'd be. I already have too many projects in progress (Paperboy, Titania, Eng, AIoOTuAMaaCAS[1]) and countless others on the heap of "would be interesting" (Paperboy UI[2], Oberon[3], Tsukuru[4], [livejournal.com profile] winterr's ai game[5], an iTunes now-playing menubar widget, a polymorphic ranking system for iTunes[6], a versioning filesystem[7], a windowing terminal multiplexor[8], CellOS,...). I've been meaning to update my list of projects to include all the ideas mentioned above before I forget about them. It'll be a good while before I have time to start new projects, which means I can thankfully delay needing to choose from so many worthy concepts.

[1] An implementation of Optimality Theory using analogy-making as a complex adaptive system. My second potential thesis topic, preferred over Eng because it's more linguistic and so would be better for getting into doctorate programs.

[2] A general interface adding "mailbox" capabilities to Paperboy. This would be the back end of the UI which commandline and GUI UIs would pull from; i.e. the logic as opposed to the presentation.

[3] An SQL-XML bridgework so that Titania could use a relational database behind the scenes but keep the same XML/XSLT interface.

[4] An ontology-driven replacement for GNU Make. My other application for Google's Summer of Code.

[5] A Master of Orion like game using some sort of needs-based/genetic AI.

[6] I.e. so you can rank how much you like a song, an album, an artist, a genre,... and have the displayed rank be a composite of these other rankings. This project could also be rolled into the semantic-space tagging framework.

[7] An FS that would automatically keep track of older versions of files like CVS, Subversion, and other versioning systems.

[8] Something like a cross between Screen and Xserver.

Date: 2006-06-20 01:25 pm (UTC)From: [identity profile] silmaril.livejournal.com
All that complexity... I think that's why LJ lets people define their own interests and tags; that's also why you end up having to be careful when doing a search-by-interest. Even a simple thing like alternate spellings---so I seem interested both in "medieaval music" and "medieval music," for instance, and I should add "early music" and "renaissance music" there also, even though those aren't equivalent, but they are closely related enough.... etc.

A lot to think about.

Date: 2006-06-22 06:15 am (UTC)From: [identity profile] winterkoninkje.livejournal.com
Were not for the 150 limit, then sure, just doing the flat approach is the simplest solution. But, since there is a limit -- and a rather low one too, imo -- then there's the need to balance "well, do I put this in by its alternate spellings/names, or do I free up that precious slot for something else? And if I free it up, which spelling do I go with?"

Of course, if you get rid of the limit or set it (un)reasonably high, then you start getting into categorization/diffusion issues. Considering typical interests, it'd be helpful to be able to break things out into music, food, clothes, politics, etc. That way you could scan a bit of each category to get an idea of the person without needing to read through everything to make sure you covered the bases.

But then who defines the categories? If you had a more sophisticated ontological framework than just a simple flat listing of interests then you could leave it up to the users to create and adapt for themselves. If it were really sophisticated then it could deal with aliasing problems (i.e. spelling differences for the same thing) or "discover" new categories in my interests (i.e. if I list interest A and give it no category (or give it category C), and someone else lists A and puts it in category B, then one could look at my profile with a filter so A shows up in B)

April 2019

S M T W T F S
 123456
78910111213
14151617181920
212223242526 27
282930    

Tags

Page generated 16 Jun 2025 08:36 pm
Powered by Dreamwidth Studios