I've been thinking recently about the free monoid, in particular about why it is what it is. Before you run off in fear of the terminology, read on. The rest of this post is in English, a rare thing for discussions of fundamentals of mathematics. It's that rareness which led me to muse on what all that abstract nonsense actually means.
For those who don't know what a monoid is, it's a triple <S, ⋅, ε> where S is a set, ⋅ is a binary operation over that set which is associative (i.e. (x ⋅ y) ⋅ z == x ⋅ (y ⋅ z)), and ε is the left and right identity of ⋅ (i.e. x ⋅ ε == x == ε ⋅ x). These kinds of functions are incredibly common. Semirings, which are also incredibly common, each have two. For example: addition with 0 and multiplication with 1 over the natural numbers; disjunction with False and conjunction with True over the booleans; union with the empty set and intersection with the universal set over the subsets of some universal set. Given how common they are, sometimes we'd like to construct an arbitrary one for as cheaply as possible, for free.
We can start by thinking about how to construct a binary operation for free. Let us consider * where for any x and y we define x * y = x * y. That is, when we apply * as a function to two arguments, the result is a binary tree with the root labeled * and each of the arguments as a child. It's plain to see that this is the most trivial binary operation: it does nothing, and it tells us nothing about the underlying set! But since it makes no requirements of the underlying set, that also means we can construct such a function for any underlying set.
So what about a free associative binop? Well, first, let's construct * as a free binop, which we can also write as simple juxtaposition (i.e. x * y == xy). Now, if we have some expression like "xyz" or "abcde", without the parentheses it's not clear what the tree of *s is for those expressions. If we declare that * is associative, then that doesn't matter. By the associative laws, all those trees have the same meaning, the same value, and so "abcde" forms an equivalence class over the trees (a(b(c(de)))), a(b((cd)e)), a((b(cd))e),... Another way of thinking of an associative binop is that it's really a variadic vector operation; that is, an operation that takes in an arbitrary list of arguments, much like functions in Perl or Lisp[1]. So a free associative binop is one which simply takes in a sequence of objects and returns a sequence of objects.
In order to make it into a monoid, we can say that the identity of our binop is nothing. This sounds like a cop-out but consider our examples from the beginning, in particular: addition over the natural numbers, and union over the powerset. It's easy to see that zero and the empty set are just ways of saying "nothing". For the other monoids it may be less intuitive, but being the left/right identity of an operation means that no matter how many times we add it in, we're not adding anything, we're adding nothing.
The free monoid over a set S can be less obtusely described as the set of all sequences of elements from S; or if you're a discrete mathematician, you'll prefer to call them strings over S. Our associative operation is string catenation, our identity is the empty string, and our carrier is the set of all strings over S[2]. In other words, we can interpret any set, S, as an alphabet and construct the set of all strings over that alphabet. Binary numbers are generated by the free monoid over {0,1}, decimal numbers are generated by the free monoid over {0,1,2,3,4,5,6,7,8,9}, all plays writable by a monkey typist can be generated by the free monoid over the set of keys on a keyboard.
When first learning about free objects it can be hard to see where they came from or why we chose a particular one to call the free one. After thinking far too much about Herbrandian terms and forgetful functors, I think I'm starting to understand the answer to those questions. Why is the free monoid strings instead of something else? Because strings have all, and only, the syntax necessary to fulfill the requirements of being a monoid. They have no semantics, no interpretation, nor do they interact with the structure of the set they're operating over. In a sense, this syntax, this minimalist description is the definition of a monoid; the fact that 1+1 can be interpreted as 2, or that {a}∪{b} can be interpreted as {a,b}, is just polish.
[1] Note that, unless it is also commutative, the order still matters. As a trivial example consider subtraction, a -b -c -d -e... . Of course, the significance of order follows in the examples of Perl and Lisp as well.
[2] Note that the carrier is not S itself, though there's an inclusion function from elements of S to elements of the set of strings over S.