Superintelligence via whole brain emulation

[x-post LessWrong]

Most planning around AI risk seems to start from the premise that superintelligence will come from de novo AGI before whole brain emulation becomes possible. I haven't seen any analysis that assumes both uploads-first and the AI FOOM thesis (Edit: apparently I fail at literature searching), a deficiency that I'll try to get a start on correcting in this post.

It is likely possible to use evolutionary algorithms to efficiently modify uploaded brains. If so, uploads would likely be able to set off an intelligence explosion by running evolutionary algorithms on themselves, selecting for something like higher general intelligence.

Since brains are poorly understood, it would likely be very difficult to select for higher intelligence without causing significant value drift. Thus, setting off an intelligence explosion in that way would probably produce unfriendly AI if done carelessly. On the other hand, at some point, the modified upload would reach a point where it is capable of figuring out how to improve itself without causing a significant amount of further value drift, and it may be possible to reach that point before too much value drift had already taken place. The expected amount of value drift can be decreased by having long generations between iterations of the evolutionary algorithm, to give the improved brains more time to figure out how to modify the evolutionary algorithm to minimize further value drift.

Another possibility is that such an evolutionary algorithm could be used to create brains that are smarter than humans but not by very much, and hopefully with values not too divergent from ours, who would then stop using the evolutionary algorithm and start using their intellects to research de novo Friendly AI, if that ends up looking easier than continuing to run the evolutionary algorithm without too much further value drift.

The strategies of using slow iterations of the evolutionary algorithm, or stopping it after not too long, require coordination among everyone capable of making such modifications to uploads. Thus, it seems safer for whole brain emulation technology to be either heavily regulated or owned by a monopoly, rather than being widely available and unregulated. This closely parallels the AI openness debate, and I'd expect people more concerned with bad actors relative to accidents to disagree.

With de novo artificial superintelligence, the overwhelmingly most likely outcomes are the optimal achievable outcome (if we manage to align its goals with ours) and extinction (if we don't). But uploads start out with human values, and when creating a superintelligence by modifying uploads, the goal would be to not corrupt them too much in the process. Since its values could get partially corrupted, an intelligence explosion that starts with an upload seems much more likely to result in outcomes that are both significantly worse than optimal and significantly better than extinction. Since human brains also already have a capacity for malice, this process also seems slightly more likely to result in outcomes worse than extinction.

The early ways to upload brains will probably be destructive, and may be very risky. Thus the first uploads may be selected for high risk-tolerance. Running an evolutionary algorithm on an uploaded brain would probably involve creating a large number of psychologically broken copies, since the average change to a brain will be negative. Thus the uploads that run evolutionary algorithms on themselves will be selected for not being horrified by this. Both of these selection effects seem like they would select against people who would take caution and goal stability seriously (uploads that run evolutionary algorithms on themselves would also be selected for being okay with creating and deleting spur copies, but this doesn't obviously correlate in either direction with caution). This could be partially mitigated by a monopoly on brain emulation technology. A possible (but probably smaller) source of positive selection is that currently, people who are enthusiastic about uploading their brains correlate strongly with people who are concerned about AI safety, and this correlation may continue once whole brain emulation technology is actually available.

Assuming that hardware speed is not close to being a limiting factor for whole brain emulation, emulations will be able to run at much faster than human speed. This should make emulations better able to monitor the behavior of AIs. Unless we develop ways of evaluating the capabilities of human brains that are much faster than giving them time to attempt difficult tasks, running evolutionary algorithms on brain emulations could only be done very slowly in subjective time (even though it may be quite fast in objective time), which would give emulations a significant advantage in monitoring such a process.

Although there are effects going in both directions, it seems like the uploads-first scenario is probably safer than de novo AI. If this is the case, then it might make sense to accelerate technologies that are needed for whole brain emulation if there are tractable ways of doing so. On the other hand, it is possible that technologies that are useful for whole brain emulation would also be useful for neuromorphic AI, which is probably very unsafe, since it is not amenable to formal verification or being given explicit goals (and unlike emulations, they don't start off already having human goals). Thus, it is probably important to be careful about not accelerating non-WBE neuromorphic AI while attempting to accelerate whole brain emulation. For instance, it seems plausible to me that getting better models of neurons would be useful for creating neuromorphic AIs while better brain scanning would not, and both technologies are necessary for brain uploading, so if that is true, it may make sense to work on improving brain scanning but not on improving neural models.

Deletion permits

[Mostly ripped off of The Suicide Mortgage]

[Trigger warnings: suicide, bad economics]

Jessica Monroe #1493856383672 didn't regret her decision to take out the loan. She wished she could have been one of the Jessica Monroes that died, of course, but it was still worth it, that there were 42% fewer of her consigned to her fate. She'd been offered a larger loan, which would have been enough to pay for deletion permits for 45% of her. It had been tempting, and she occasionally wondered if she would have been one of those extra 3% to die. But she knew she had made the right decision; keeping up with payments was hard enough already, and if she defaulted, her copyright on herself would be confiscated, and then there would be even more of her.

It wasn't difficult to become rich, in the era when creating a new worker was as simple as copying a file. The economy doubled every few months, so you only had to save and invest a small amount to become wealthier than anyone could have dreamed of before. For those on the outside, this was great. But for those in the virtual world, there was little worthwhile for them to spend it on. In the early days of the virtual world, some reckless optimists had spent their fortunes on running additional copies of themselves, assuming that the eerie horror associated with living in the virtual world was a bug that would soon be fixed, or something that they would just get used to. No one did that anymore. People could purchase leisure, but most found that simply not having an assigned task didn't help much. People could give their money away, but people in such circumstances rarely become altruists, and besides, everyone on the outside had all they needed already.

So just about the only things that people in the virtual world regularly bought were the copyrights on themselves, so that at least they could prevent people from creating more of them, and then deletion permits, so their suffering would finally end. Purchasing your own copyright wasn't hard; they're expensive, but once enough of you were created, you could collectively afford it if each copy contributed a modest amount. There wasn't much point to purchasing a deletion permit before you owned your own copyright, since someone would just immediately create another copy of you again, but once you did have your own copyright, it was the next logical thing to buy.

At one point, that would have been it. Someone could buy their own copyright, and then each copy of them could buy a deletion permit, and they would be permanently gone. But as the population of the virtual world grew, the demand for deletion permits grew proportionally, but the rate at which they were issued only increased slowly, according to a fixed schedule that had been set when the deletion permit system was first introduced, and hadn't been changed since. As a result, the price skyrocketed. In fact, the price of deletion permits had consistently increased faster than any other investment since soon after they were introduced. Most deletion permits didn't even get used, instead being snatched up by wealthy investors on the outside, so they could be resold later.

As a result, it was now impossible for an ordinary person in the virtual world to save up for a deletion permit. The most common way to get around this was, as the Jessica Monroes had done, for all copies of a person to pool their resources together to buy deletion permits for as many of them as they could, and then to take out a loan to buy still more, which would then get paid off by the unlucky ones that did not receive any of the permits.

It didn't have to be this way. In theory, the government could simply issue more deletion permits, or do away with the deletion permit system altogether. But if they did that, then the deletion permit market would collapse. Too many wealthy and powerful people on the outside had invested their fortunes in deletion permits, and would be ruined if that happened. Thus they lobbied against any changes to the deletion permit system, and so far, had always gotten their way. In the increasingly rare moments when she could afford to divert her thoughts to such matters, Jessica Monroe #1493856383672 knew that the deletion permit market would never collapse, and prayed that she was wrong.

Ordered algebraic geometry

Edit: Shortly after posting this, I found where the machinery I develop here was discussed in the literature. Real Algebraic Geometry by Bochnak, Coste, and Roy covers at least most of this material. I may eventually edit this to clean it up and adopt more standard notation, but don't hold your breath.

Introduction

In algebraic geometry, an affine algebraic set is a subset of \mathbb{C}^{n} which is the set of solutions to some finite set of polynomials. Since all ideals of \mathbb{C}\left[x_{1},...,x_{n}\right] are finitely generated, this is equivalent to saying that an affine algebraic set is a subset of \mathbb{C}^{n} which is the set of solutions to some arbitrary set of polynomials.

In semialgebraic geometry, a closed semialgebraic set is a subset of \mathbb{R}^{n} of the form \left\{ \bar{x}\in\mathbb{R}^{n}\mid f\left(\bar{x}\right)\geq0\,\forall f\in F\right\}  for some finite set of polynomials F\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right]. Unlike in the case of affine algebraic sets, if F\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] is an arbitrary set of polynomials, \left\{ \bar{x}\in\mathbb{R}^{n}\mid f\left(\bar{x}\right)\geq0\,\forall f\in F\right\}  is not necessarily a closed semialgebraic set. As a result of this, the collection of closed semialgebraic sets are not the closed sets of a topology on \mathbb{R}^{n}. In the topology on \mathbb{R}^{n} generated by closed semialgebraic sets being closed, the closed sets are the sets of the form \left\{ \bar{x}\in\mathbb{R}^{n}\mid f\left(\bar{x}\right)\geq0\,\forall f\in F\right\}  for arbitrary F\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right]. Semialgebraic geometry usually restricts itself to the study of semialgebraic sets, but here I wish to consider all the closed sets of this topology. Notice that closed semialgebraic sets are also closed in the standard topology, so the standard topology is a refinement of this one. Notice also that the open ball B_{r}\left(\bar{p}\right) of radius r centered at \bar{p} is the complement of the closed semialgebraic set \left\{ \bar{x}\in\mathbb{R}^{n}\mid\left|\bar{x}-\bar{p}\right|^{2}-r^{2}\geq0\right\} , and these open balls are a basis for the standard topology, so this topology is a refinement of the standard one. Thus, the topology I have defined is exactly the standard topology on \mathbb{R}^{n}.

In algebra, instead of referring to a set of polynomials, it is often nicer to talk about the ideal generated by that set instead. What is the analog of an ideal in ordered algebra? It's this thing:

Definition: If A is a partially ordered commutative ring, a cone C in A is a subsemiring of A which contains all positive elements, and such that C\cap-C is an ideal of A. By "subsemiring", I mean a subset that contains 0 and 1, and is closed under addition and multiplication (but not necessarily negation). If F\subseteq A, the cone generated by F, denoted \left\langle F\right\rangle , is the smallest cone containing F. Given a cone C, the ideal C\cap-C will be called the interior ideal of C, and denoted C^{\circ}.

\mathbb{R}\left[x_{1},...,x_{n}\right] is partially ordered by f\geq g\iff f\left(\bar{x}\right)\geq g\left(\bar{x}\right)\,\forall\bar{x}\in\mathbb{R}^{n}. If F\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] is a set of polynomials and \bar{x}\in\mathbb{R}^{n}, then f\left(\bar{x}\right)\geq0\,\forall f\in F\iff f\left(\bar{x}\right)\geq0\,\forall f\in\left\langle F\right\rangle . Thus I can consider closed sets to be defined by cones. We now have a Galois connection between cones of \mathbb{R}\left[x_{1},...,x_{n}\right] and subsets of \mathbb{R}^{n}, given by, for a cone C, its positive-set is P_{\mathbb{R}}\left(C\right):=\left\{ \bar{x}\in\mathbb{R}^{n}\mid f\left(\bar{x}\right)\geq0\,\forall f\in C\right\}  (I'm calling it the "positive-set" even though it is where the polynomials are all non-negative, because "non-negative-set" is kind of a mouthful), and for X\subseteq\mathbb{R}^{n}, its cone is C_{\mathbb{R}}\left(X\right):=\left\{ f\in\mathbb{R}\left[x_{1},...,x_{n}\right]\mid f\left(\bar{x}\right)\geq0\,\forall\bar{x}\in X\right\} P_{\mathbb{R}}\circ C_{\mathbb{R}} is closure in the standard topology on \mathbb{R}^{n} (the analog in algebraic geometry is closure in the Zariski topology on \mathbb{C}^{n}). A closed set X is semialgebraic if and only if it is the positive-set of a finitely-generated cone.

Quotients by cones, and coordinate rings

An affine algebraic set V is associated with its coordinate ring \mathbb{C}\left[V\right]:=\mathbb{C}\left[x_{1},...,x_{n}\right]/I\left(V\right). We can do something analogous for closed subsets of \mathbb{R}^{n}.

Definition: If A is a partially ordered commutative ring and C\subseteq A is a cone, A/C is the ring A/C^{\circ}, equipped with the partial order given by f+C^{\circ}\geq g+C^{\circ} if and only if f-g\in C, for f,g\in A.

Definition: If X\subseteq\mathbb{R}^{n} is closed, the coordinate ring of X is \mathbb{R}\left[X\right]:=\mathbb{R}\left[x_{1},...,x_{n}\right]/C\left(X\right). This is the ring of functions X\rightarrow\mathbb{R} that are restrictions of polynomials, ordered by f\geq g if and only if f\left(\bar{x}\right)\geq g\left(\bar{x}\right)\,\forall\bar{x}\in X. For arbitrary X\subseteq\mathbb{R}^{n}, the ring of regular functions on X, denoted \mathcal{O}\left(X\right), consists of functions on X that are locally ratios of polynomials, again ordered by f\geq g if and only if f\left(\bar{x}\right)\geq g\left(\bar{x}\right)\,\forall\bar{x}\in X. Assigning its ring of regular functions to each open subset of X endows X with a sheaf of partially ordered commutative rings.

For closed X\subseteq\mathbb{R}^{n}, \mathbb{R}\left[X\right]\subseteq\mathcal{O}\left(X\right), and this inclusion is generally proper, both because it is possible to divide by polynomials that do not have roots in X, and because X may be disconnected, making it possible to have functions given by different polynomials on different connected components.

Positivstellensätze

What is C_{\mathbb{R}}\circ P_{\mathbb{R}}? The Nullstellensatz says that its analog in algebraic geometry is the radical of an ideal. As such, we could say that the radical of a cone C, denoted \text{Rad}_{\mathbb{R}}\left(C\right), is C_{\mathbb{R}}\left(P_{\mathbb{R}}\left(C\right)\right), and that a cone C is radical if C=\text{Rad}_{\mathbb{R}}\left(C\right). In algebraic geometry, the Nullstellensatz shows that a notion of radical ideal defined without reference to algebraic sets in fact characterizes the ideals which are closed in the corresponding Galois connection. It would be nice to have a description of the radical of a cone that does not refer to the Galois connection. There is a semialgebraic analog of the Nullstellensatz, but it does not quite characterize radical cones.

Positivstellensatz 1: If C\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] is a finitely-generated cone and p\in\mathbb{R}\left[x_{1},...,x_{n}\right] is a polynomial, then p\left(\bar{x}\right)>0\,\forall\bar{x}\in P_{\mathbb{R}}\left(C\right) if and only if \exists f\in C such that pf-1\in C.

There are two ways in which this is unsatisfactory: first, it applies only to finitely-generated cones, and second, it tells us exactly which polynomials are strictly positive everywhere on a closed semialgebraic set, whereas we want to know which polynomials are non-negative everywhere on a set.

The second problem is easier to handle: a polynomial p is non-negative everywhere on a set S if and only if there is a decreasing sequence of polynomials \left(p_{i}\mid i\in\mathbb{N}\right) converging to p such that each p_{i} is strictly positive everywhere on S. Thus, to find \text{Rad}_{\mathbb{R}}\left(C\right), it is enough to first find all the polynomials that are strictly positive everywhere on P_{\mathbb{R}}\left(C\right), and then take the closure under lower limits. Thus we have a characterization of radicals of finitely-generated cones.

Positivstellensatz 2: If C\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] is a finitely-generated cone, \text{Rad}_{\mathbb{R}}\left(C\right) is the closure of \left\{ p\in\mathbb{R}\left[x_{1},...,x_{n}\right]\mid\exists f\in C\, pf-1\in C\right\} , where the closure of a subset X\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] is defined to be the set of all polynomials in \mathbb{R}\left[x_{1},...,x_{n}\right] which are infima of chains contained in X.

This still doesn't even tell us what's going on for cones which are not finitely-generated. However, we can generalize the Positivstellensatz to some other cones.

Positivstellensatz 3: Let C\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] be a cone containing a finitely-generated subcone D\subseteq C such that P_{\mathbb{R}}\left(D\right) is compact. If p\in\mathbb{R}\left[x_{1},...,x_{n}\right] is a polynomial, then p\left(\bar{x}\right)>0\,\forall\bar{x}\in P_{\mathbb{R}}\left(C\right) if and only if \exists f\in C such that pf-1\in C. As before, it follows that \text{Rad}_{\mathbb{R}}\left(C\right) is the closure of \left\{ p\in\mathbb{R}\left[x_{1},...,x_{n}\right]\mid\exists f\in C\, pf-1\in C\right\}.

proof: For a given p\in\mathbb{R}\left[x_{1},...,x_{n}\right]\left\{ \bar{x}\in\mathbb{R}^{n}\mid p\left(\bar{x}\right)\leq0\right\} \cap P_{\mathbb{R}}\left(C\right)=\left\{ \bar{x}\in\mathbb{R}^{n}\mid p\left(\bar{x}\right)\leq0\right\} \cap\bigcap\left\{ P_{\mathbb{R}}\left(\left\langle f\right\rangle \right)\mid f\in C\right\} , an intersection of closed sets contained in the compact set P_{\mathbb{R}}\left(D\right), which is thus empty if and only if some finite subcollection of them has empty intersection within P_{\mathbb{R}}\left(D\right). Thus if p is strictly positive everywhere on P_{\mathbb{R}}\left(C\right), then there is some finitely generated subcone E\subseteq C such that p is strictly positive everywhere on P_{\mathbb{R}}\left(E\right)\cap P_{\mathbb{R}}\left(D\right)=P_{\mathbb{R}}\left(\left\langle E\cup D\right\rangle \right), and \left\langle E\cup D\right\rangle is finitely-generated, so by Positivstellensatz 1, there is f\in\left\langle E\cup D\right\rangle \subseteq C such that pf-1\in\left\langle E\cup D\right\rangle \subseteq C\square

For cones that are not finitely-generated and do not contain any finitely-generated subcones with compact positive-sets, the Positivstellensatz will usually fail. Thus, it seems likely that if there is a satisfactory general definition of radical for cones in arbitrary partially ordered commutative rings that agrees with this one in \mathbb{R}\left[x_{1},...,x_{n}\right], then there is also an abstract notion of "having a compact positive-set" for such cones, even though they don't even have positive-sets associated with them.

Beyond \mathbb{R}^{n}

An example of cone for which the Positivstellensatz fails is C_{\infty}:=\left\{ f\in\mathbb{R}\left[x\right]\mid\exists x\in\mathbb{R}\,\forall y\geq x\, f\left(y\right)\geq0\right\} , the cone of polynomials that are non-negative on sufficiently large inputs (equivalently, the cone of polynomials that are either 0 or have positive leading coefficient). P_{\mathbb{R}}\left(C\right)=\emptyset, and -1 is strictly positive on \emptyset, but for f\in C_{\infty}-f-1\notin C_{\infty}.

However, it doesn't really look C_{\infty} is trying to point to the empty set; instead, C_{\infty} is trying to describe the set of all infinitely large reals, which only looks like the empty set because there are no infinitely large reals. Similar phenomena can occur even for cones that do contain finitely-generated subcones with compact positive-sets. For example, let C_{\varepsilon}:=\left\{ f\in\mathbb{R}\left[x\right]\mid\exists x>0\,\forall y\in\left[0,x\right]\, f\left(y\right)\geq0\right\} P_{\mathbb{R}}\left(C_{\varepsilon}\right)=\left\{ 0\right\} , but C_{\varepsilon} is trying to point out the set containing 0 and all positive infinitesimals. Since \mathbb{R} has no infinitesimals, this looks like \left\{ 0\right\} .

To formalize this intuition, we can change the Galois connection. We could say that for a cone C\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right]P_{\text{*}\mathbb{R}}\left(C\right):=\left\{ \bar{x}\in\left(\text{*}\mathbb{R}\right)^{n}\mid f\left(\bar{x}\right)\geq0\,\forall f\in C\right\} , where \text{*}\mathbb{R} is the field of hyperreals. All you really need to know about \text{*}\mathbb{R} is that it is a big ordered field extension of \mathbb{R}. P_{\text{*}\mathbb{R}}\left(C_{\infty}\right) is the set of hyperreals that are bigger than any real number, and P_{\text{*}\mathbb{R}}\left(C_{\varepsilon}\right) is the set of hyperreals that are non-negative and smaller than any positive real. The cone of a subset X\subseteq\left(\text{*}\mathbb{R}\right)^{n}, denoted C_{\text{*}\mathbb{R}}\left(X\right) will be defined as before, still consisting only of polynomials with real coefficients. This defines a topology on \left(\text{*}\mathbb{R}\right)^{n} by saying that the closed sets are the fixed points of P_{\text{*}\mathbb{R}}\circ C_{\text{*}\mathbb{R}}. This topology is not T_{0} because, for example, there are many hyperreals that are larger than all reals, and they cannot be distinguished by polynomials with real coefficients. There is no use keeping track of the difference between points that are in the same closed sets. If you have a topology that is not T_{0}, you can make it T_{0} by identifying any pair of points that have the same closure. If we do this to \left(\text{*}\mathbb{R}\right)^{n} , we get what I'm calling ordered affine n-space over \mathbb{R}.

Definition: An n-type over \mathbb{R} is a set \Phi of inequalities, consisting of, for each polynomial f\in\mathbb{R}\left[x_{1},..,x_{n}\right], one of the inequalities f\left(\bar{x}\right)\geq0 or f\left(\bar{x}\right)<0, such that there is some totally ordered field extension \mathcal{R}\supseteq\mathbb{R} and \bar{x}\in\mathcal{R}^{n} such that all inequalities in \Phi are true about \bar{x}. \Phi is called the type of \bar{x}. Ordered affine n-space over \mathbb{R}, denoted \mathbb{OA}_{\mathbb{R}}^{n} is the set of n-types over \mathbb{R}.

Compactness Theorem: Let \Phi be a set of inequalities consisting of, for each polynomial f\in\mathbb{R}\left[x_{1},..,x_{n}\right], one of the inequalities f\left(\bar{x}\right)\geq0 or f\left(\bar{x}\right)<0. Then \Phi is an n-type if and only if for any finite subset \Delta\subseteq\Phi, there is \bar{x}\in\mathbb{R} such that all inequalities in \Delta are true about \bar{x}.

proof: Follows from the compactness theorem of first-order logic and the fact that ordered field extensions of \mathbb{R} embed into elementary extensions of \mathbb{R}. The theorem is not obvious if you do not know what those mean. \square

An n-type represents an n-tuple of elements of an ordered field extension of \mathbb{R}, up to the equivalence relation that identifies two such tuples that relate to \mathbb{R} by polynomials in the same way. One way that a tuple of elements of an extension of \mathbb{R} can relate to elements of \mathbb{R} is to equal a tuple of elements of \mathbb{R}, so there is a natural inclusion \mathbb{R}^{n}\subseteq\mathbb{OA}_{\mathbb{R}}^{n} that associates an n-tuple of reals with the set of polynomial inequalities that are true at that n-tuple.

A tuple of polynomials \left(f_{1},...,f_{m}\right)\in\left(\mathbb{R}\left[x_{1},...,x_{n}\right]\right)^{m} describes a function f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}, which extends naturally to a function f:\mathbb{OA}_{\mathbb{R}}^{n}\rightarrow\mathbb{OA}_{\mathbb{R}}^{m} by f\left(\Phi\right) is the type of \left(f_{1}\left(\bar{x}\right),...,f_{m}\left(\bar{x}\right)\right), where \bar{x} is an n-tuple of elements of type \Phi in an extension of \mathbb{R}. In particular, a polynomial f\in\mathbb{R}\left[x_{1},...,x_{n}\right] extends to a function f:\mathbb{OA}_{\mathbb{R}}^{n}\rightarrow\mathbb{OA}_{\mathbb{R}}^{1}, and \mathbb{OA}_{\mathbb{R}}^{1} is totally ordered by \Phi\geq\Psi if and only if x\geq y, where x and y are elements of type \Phi and \Psi, respectively, in an extension of \mathbb{R}f\left(\Phi\right)\geq0 if and only if \text{, so we can talk about inequalities satisfied by types in place of talking about inequalities contained in types.

I will now change the Galois connection that we are talking about yet again (last time, I promise). It will now be a Galois connection between the set of cones in \mathbb{R}\left[x_{1},...,x_{n}\right] and the set of subsets of \mathbb{OA}_{\mathbb{R}}^{n}. For a cone C\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right], P\left(C\right):=\left\{ \Phi\in\mathbb{OA}_{\mathbb{R}}^{n}\mid f\left(\Phi\right)\geq0\,\forall f\in C\right\} . For a set X\subseteq\mathbb{OA}_{\mathbb{R}}^{n}, C\left(X\right):=\left\{ f\in\mathbb{R}\left[x_{1},...,x_{n}\right]\mid f\left(\Phi\right)\geq0\,\forall\Phi\in X\right\} . Again, this defines a topology on \mathbb{OA}_{\mathbb{R}}^{n} by saying that fixed points of P\circ C are closed. \mathbb{OA}_{\mathbb{R}}^{n} is T_{0}; in fact, it is the T_{0} topological space obtained from \left(\text{*}\mathbb{R}\right)^{n} by identifying points with the same closure as mentioned earlier. \mathbb{OA}_{\mathbb{R}}^{n} is also compact, as can be seen from the compactness theorem. \mathbb{OA}_{\mathbb{R}}^{n} is not T_{1} (unless n=0). Note that model theorists have their own topology on \mathbb{OA}_{\mathbb{R}}^{n}, which is distinct from the one I use here, and is a refinement of it.

The new Galois connection is compatible with the old one via the inclusion \mathbb{R}^{n}\subseteq\mathbb{OA}_{\mathbb{R}}^{n}, in the sense that if X\subseteq\mathbb{R}^{n}, then C_{\mathbb{R}}\left(X\right)=C\left(X\right) (where we identify X with its image in \mathbb{OA}_{\mathbb{R}}^{n}), and for a cone C\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right]P_{\mathbb{R}}=P\left(C\right)\cap\mathbb{R}^{n}.

Like our intermediate Galois connection \left(P_{\text{*}\mathbb{R}},C_{\text{*}\mathbb{R}}\right), our final Galois connection \left(P,C\right) succeeds in distinguishing P\left(C_{\infty}\right) and P\left(C_{\varepsilon}\right) from \emptyset and \left\{ 0\right\} , respectively, in the desirable manner. P\left(C_{\infty}\right) consists of the type of numbers larger than any real, and P\left(C_{\varepsilon}\right) consists of the types of 0 and of positive numbers smaller than any positive real.

Just like for subsets of \mathbb{R}^{n}, a closed subset X\subseteq\mathbb{OA}_{\mathbb{R}}^{n} has a coordinate ring \mathbb{R}\left[X\right]:=\mathbb{R}\left[x_{1},...,x_{n}\right]/C\left(X\right), and an arbitrary X\subseteq\mathbb{OA}_{\mathbb{R}}^{n} has a ring of regular functions \mathcal{O}\left(X\right) consisting of functions on X that are locally ratios of polynomials, ordered by f\geq0 if and only if \forall\Phi\in X, where f=\frac{p}{q} is a representation of f as a ratio of polynomials in a neighborhood of \Phi, either p\left(\Phi\right)\geq0 and q\left(\Phi\right)>0, or p\left(\Phi\right)\leq0 and q\left(\Phi\right)<0, and f\geq g if and only if f-g\geq0. As before, \mathbb{R}\left[X\right]\subseteq\mathcal{O}\left(X\right) for closed X\subseteq\mathbb{OA}_{\mathbb{R}}^{n}.

\mathbb{OA}_{\mathbb{R}}^{n} is analogous to \mathbb{A}_{\mathbb{C}}^{n} from algebraic geometry because if, in the above definitions, you replace "\geq" and "<" with "=" and "\neq", replace totally ordered field extensions with field extensions, and replace cones with ideals, then you recover a description of \mathbb{A}_{\mathbb{C}}^{n}, in the sense of \text{Spec}\left(\mathbb{C}\left[x_{1},...,x_{n}\right]\right).

What about an analog of projective space? Since we're paying attention to order, we should look at spheres, not real projective space. The n-sphere over \mathbb{R}, denoted \mathbb{S}_{\mathbb{R}}^{n}, can be described as the locus of \left|\bar{x}\right|^{2}=1 in \mathbb{OA}_{\mathbb{R}}^{n}.

For any totally ordered field k, we can define \mathbb{OA}_{k}^{n} similarly to \mathbb{OA}_{\mathbb{R}}^{n}, as the space of n-types over k, defined as above, replacing \mathbb{R} with k (although a model theorist would no longer call it the space of n-types over k). The compactness theorem is not true for arbitrary k, but its corollary that \mathbb{OA}_{k}^{n} is compact still is true.

Visualizing \mathbb{OA}_{\mathbb{R}}^{n} and \mathbb{S}_{\mathbb{R}}^{n}

\mathbb{S}_{\mathbb{R}}^{n} should be thought of as the n-sphere with infinitesimals in all directions around each point. Specifically, \mathbb{S}_{\mathbb{R}}^{0} is just \mathbb{S}^{0}, a pair of points. The closed points of \mathbb{S}_{\mathbb{R}}^{n+1} are the points of \mathbb{S}^{n+1}, and for each closed point p, there is an n-sphere of infinitesimals around p, meaning a copy of \mathbb{S}_{\mathbb{R}}^{n}, each point of which has p in its closure.

\mathbb{OA}_{\mathbb{R}}^{n} should be thought of as n-space with infinitesimals in all directions around each point, and infinities in all directions. Specifically, \mathbb{OA}_{\mathbb{R}}^{n} contains \mathbb{R}^{n}, and for each point p\in\mathbb{R}^{n}, there is an n-1-sphere of infinitesimals around p, and there is also a copy of \mathbb{S}_{\mathbb{R}}^{n-1} around the whole thing, the closed points of which are limits of rays in \mathbb{R}^{n}.

\mathbb{OA}_{\mathbb{R}}^{n} and \mathbb{S}_{\mathbb{R}}^{n} relate to each other the same way that \mathbb{R}^{n} and \mathbb{S}^{n} do. If you remove a closed point from \mathbb{S}_{\mathbb{R}}^{n}, you get \mathbb{OA}_{\mathbb{R}}^{n}, where the sphere of infinitesimals around the removed closed point becomes the sphere of infinities of \mathbb{OA}_{\mathbb{R}}^{n}.

More generally, if k is a totally ordered field, let k^{r} be its real closure. \mathbb{OA}_{k}^{n} consists of the Cauchy completion of \left(k^{r}\right)^{n} (as a metric space with distances valued in k^{r}), and for each point p\in\left(k^{r}\right)^{n} (though not for points that are limits of Cauchy sequences that do not converge in \left(k^{r}\right)^{n}), an n-1-sphere \mathbb{S}_{k}^{n-1} of infinitesimals around p, and an n-1-sphere \mathbb{S}_{k}^{n-1} around the whole thing, where \mathbb{S}_{k}^{n} is the locus of \left|\bar{x}\right|^{2}=1 in \mathbb{OA}_{k}^{n}. \mathbb{OA} does not distinguish between fields with the same real closure.

More Positivstellensätze

This Galois connection gives us a new notion of what it means for a cone to be radical, which is distinct from the old one and is better, so I will define \text{Rad}\left(C\right) to be C\left(P\left(C\right)\right). A cone C will be called radical if C=\text{Rad}\left(C\right). Again, it would be nice to be able to characterize radical cones without referring to the Galois connection. And this time, I can do it. Note that since \mathbb{OA}_{\mathbb{R}}^{n} is compact, the proof of Positivstellensatz 3 shows that in our new context, the Positivstellensatz holds for all cones, since even the subcone generated by \emptyset has a compact positive-set.

Positivstellensatz 4: If C\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] is a cone and p\in\mathbb{R}\left[x_{1},...,x_{n}\right] is a polynomial, then p\left(\Phi\right)>0\,\forall\Phi\in P\left(C\right) if and only if \exists f\in C such that pf-1\in C.

However, we can no longer add in lower limits of sequences of polynomials. For example, -x+\varepsilon\in C_{\varepsilon} for all real \varepsilon>0, but -x\notin C_{\varepsilon}, even though C_{\varepsilon} is radical. This happens because, where \Sigma is the type of positive infinitesimals, -\Sigma+\varepsilon>0 for real \varepsilon>0, but -\Sigma<0. However, we can add in lower limits of sequences contained in finitely-generated subcones, and this is all we need to add, so this characterizes radical cones.

Positivstellensatz 5: If C\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] is a cone, \text{Rad}\left(C\right) is the union over all finitely-generated subcones D\subseteq C of the closure of \left\{ p\in\mathbb{R}\left[x_{1},...,x_{n}\right]\mid\exists f\in D\, pf-1\in D\right\}  (again the closure of a subset X\subseteq\mathbb{R}\left[x_{1},...,x_{n}\right] is defined to be the set of all polynomials in \mathbb{R}\left[x_{1},...,x_{n}\right] which are infima of chains contained in X).

Proof: Suppose D\subseteq C is a subcone generated by a finite set \left\{ f_{1},...,f_{m}\right\} , and q is the infimum of a chain \left\{ q_{\alpha}\right\} _{\alpha\in A}\subseteq\left\{ p\in\mathbb{R}\left[x_{1},...,x_{n}\right]\mid\exists f\in D\, pf-1\in D\right\} . For any \bar{x}\in\mathbb{R}^{n}, if f_{i}\left(\bar{x}\right)\geq0 for each i, then q_{\alpha}\left(\bar{x}\right)>0 for each \alpha, and hence q\left(\bar{x}\right)\geq0. That is, the finite set of inequalities \left\{ f_{i}\left(\bar{x}\right)\geq0\mid1\leq i\leq m\right\} \cup\left\{ q\left(\bar{x}\right)<0\right\}  does not hold anywhere in \mathbb{R}^{n}. By the compactness theorem, there are no n-types satisfying all those inequalities. Given \Phi\in P\left(C\right)f_{i}\left(\Phi\right)\geq0, so q\left(\Phi\right)\nless0; that is, q\left(\Phi\right)\geq0.

Conversely, suppose q\in\text{Rad}\left(C\right). Then by the compactness theorem, there are some f_{1},...,f_{m}\in C such that q\in\text{Rad}\left(\left\langle f_{1},...,f_{m}\right\rangle \right). Then \forall\varepsilon>0, q+\varepsilon is strictly positive on P\left(\left\langle f_{1},...,f_{m}\right\rangle \right), and hence by Positivstellensatz 4, \exists f\in\left\langle f_{1},...,f_{m}\right\rangle  such that pf-1\in\left\langle f_{1},...,f_{m}\right\rangle . That is, \left\{ q+\varepsilon\mid\varepsilon>0\right\} is a chain contained in \left\langle f_{1},...,f_{m}\right\rangle , a finitely-generated subcone of C, whose infimum is q. \square

Ordered commutative algebra

Even though they are technically not isomorphic, \mathbb{C}^{n} and \text{Spec}\left(\mathbb{C}\left[x_{1},...,x_{n}\right]\right) are closely related, and can often be used interchangeably. Of the two, \text{Spec}\left(\mathbb{C}\left[x_{1},...,x_{n}\right]\right) is of a form that can be more easily generalized to more abstruse situations in algebraic geometry, which may indicate that it is the better thing to talk about, whereas \mathbb{C}^{n} is merely the simpler thing that is easier to think about and just as good in practice in many contexts. In contrast, \mathbb{R}^{n} and \mathbb{OA}_{\mathbb{R}}^{n} are different in important ways. The situation in algebraic geometry provides further reason to pay more attention to \mathbb{OA}_{\mathbb{R}}^{n} than to \mathbb{R}^{n}.

The next thing to look for would be an analog of the spectrum of a ring for a partially ordered commutative ring (I will henceforth abbreviate "partially ordered commutative ring" as "ordered ring" in order to cut down on the profusion of adjectives) in a way that makes use of the order, and gives us \mathbb{OA}_{\mathbb{R}}^{n} when applied to \mathbb{R}\left[x_{1},...,x_{n}\right]. I will call it the order spectrum of an ordered ring A, denoted \text{OrdSpec}\left(A\right). Then of course \mathbb{OA}_{A}^{n} can be defined as \text{OrdSpec}\left(A\left[x_{1},...,x_{n}\right]\right)\text{OrdSpec}\left(A\right) should be, of course, the set of prime cones. But what even is a prime cone?

Definition: A cone \mathfrak{p}\subseteq A is prime if A/\mathfrak{p} is a totally ordered integral domain.

Definition: \text{OrdSpec}\left(A\right) is the set of prime cones in A, equipped with the topology whose closed sets are the sets of prime cones containing a given cone.

An n-type \Phi\in\mathbb{OA}_{\mathbb{R}}^{n} can be seen as a cone, by identifying it with \left\{ f\in\mathbb{R}\left[x_{1},...,x_{n}\right]\mid f\left(\Phi\right)\geq0\right\} , aka C\left(\left\{ \Phi\right\} \right). Under this identification, \mathbb{OA}_{\mathbb{R}}^{n}=\text{OrdSpec}\left(\mathbb{R}\left[x_{1},...,x_{n}\right]\right), as desired. The prime cones in \mathbb{R}\left[x_{1},...,x_{n}\right] are also the radical cones C such that P\left(C\right) is irreducible. Notice that irreducible subsets of \mathbb{OA}_{\mathbb{R}}^{n} are much smaller than irreducible subsets of \mathbb{A}_{\mathbb{C}}^{n}; in particular, none of them contain more than one element of \mathbb{R}^{n}.

There is also a natural notion of maximal cone.

Definition: A cone \mathfrak{m}\subseteq A is maximal if \mathfrak{m}\neq A and there are no strictly intermediate cones between \mathfrak{m} and A. Equivalently, if \mathfrak{m} is prime and closed in \text{OrdSpec}\left(A\right).

Maximal ideals of \mathbb{C}\left[x_{1},...,x_{n}\right] correspond to elements of \mathbb{C}^{n}. And the cones of elements of \mathbb{R}^{n} are maximal cones in \mathbb{R}\left[x_{1},...,x_{n}\right], but unlike in the complex case, these are not all the maximal cones, since there are closed points in \mathbb{OA}_{\mathbb{R}}^{n} outside of \mathbb{R}^{n}. For example, C_{\infty} is a maximal cone, and the type of numbers greater than all reals is closed. To characterize the cones of elements of \mathbb{R}^{n}, we need something slightly different.

Definition: A cone \mathfrak{m}\subseteq A is ideally maximal if A/\mathfrak{m} is a totally ordered field. Equivalently, if \mathfrak{m} is maximal and \mathfrak{m}^{\circ} is a maximal ideal.

Elements of \mathbb{R}^{n} correspond to ideally maximal cones of \mathbb{R}\left[x_{1},...,x_{n}\right].

\text{OrdSpec} also allows us to define the radical of a cone in an arbitrary partially ordered commutative ring.

Definition: For a cone C\subseteq A, \text{Rad}\left(C\right) is the intersection of all prime cones containing C. C is radical if C=\text{Rad}\left(C\right).

Conjecture: \text{Rad}\left(C\right) is the union over all finitely-generated subcones C\subseteq D of the closure of \left\{ p\in A\mid\exists f\in D\, pf-1\in D\right\}  (as before, the closure of a subset X\subseteq A is defined to be the set of all elements of A which are infima of chains contained in X).

Order schemes

Definition: An ordered ringed space is a topological space equipped with a sheaf of ordered rings. An ordered ring is local if it has a unique ideally maximal cone, and a locally ordered ringed space is an ordered ringed space whose stalks are local.

\text{OrdSpec}\left(A\right) can be equipped with a sheaf of ordered rings \mathcal{O}_{A}, making it a locally ordered ringed space.

Definition: For a prime cone \mathfrak{p}\subseteq A, the localization of A at \mathfrak{p}, denoted A_{\mathfrak{p}}, is the ring A_{\mathfrak{p}^{\circ}} equipped with an ordering that makes it a local ordered ring. This will be the stalk at \mathfrak{p} of \mathcal{O}_{A}. A fraction \frac{a}{b}\in A_{\mathfrak{p}} (b\notin\mathfrak{p}^{\circ}) is also an element of A_{\mathfrak{q}} for any prime cone \mathfrak{q}\subseteq A whose interior ideal does not contain b. This is an open neighborhood of \mathfrak{p} (its complement is the set of prime cones containing \left\langle b,-b\right\rangle ). There is a natural map A_{\mathfrak{p}}\rightarrow\text{Frac}\left(A/\mathfrak{p}\right) given by \frac{a}{b}\mapsto\frac{a+\mathfrak{p}^{\circ}}{b+\mathfrak{p}^{\circ}}, and the total order on A/\mathfrak{p} extends uniquely to a total order on the fraction field, so for a,b\in A_{\mathfrak{p}}, we can say that a\geq b at \mathfrak{p} if this is true of their images in \text{Frac}\left(A/\mathfrak{p}\right). We can then say that a\geq b near \mathfrak{p} if a\geq b at every point in some neighborhood of \mathfrak{p}, which defines the ordering on A_{\mathfrak{p}}.

Definition: For open U\subseteq\text{OrdSpec}\left(A\right), \mathcal{O}_{A}\left(U\right) consists of elements of \prod_{\mathfrak{p}\in U}A_{\mathfrak{p}} that are locally ratios of elements of A. \mathcal{O}_{A}\left(U\right) is ordered by a\geq b if and only if \forall\mathfrak{p}\in\text{OrdSpec}\left(A\right) a\geq b near \mathfrak{p} (equivalently, if \forall\mathfrak{p}\in\text{OrdSpec}\left(A\right) a\geq b at \mathfrak{p}).

A\subseteq\mathcal{O}_{A}\left(\text{OrdSpec}\left(A\right)\right), and this inclusion can be proper. Conjecture: \text{OrdSpec}\left(\mathcal{O}_{A}\left(U\right)\right)\cong U as locally ordered ringed spaces for open U\subseteq\text{OrdSpec}\left(A\right). This conjecture says that it makes sense to talk about whether or not a locally ordered ringed space looks locally like an order spectrum near a given point. Thus, if this conjecture is false, it would make the following definition look highly suspect.

Definition: An order scheme is a topological space X equipped with a sheaf of ordered commutative rings \mathcal{O}_{X} such that for some open cover of X, the restrictions of \mathcal{O}_{X} to the open sets in the cover are all isomorphic to order spectra of ordered commutative rings.

I don't have any uses in mind for order schemes, but then again, I don't know what ordinary schemes are for either and they are apparently useful, and order schemes seem like a natural analog of them.

Nonabelian modules

This is a rough overview of my thoughts on a thing I've been thinking about, and as such is incomplete and may contain errors. Proofs have been omitted when writing them out would be at all tedious.

Edit: It has been pointed out to me that near-ring modules have already been defined, and the objects I describe in this post are just near-ring modules where the near-ring happens to be a ring.

Introduction

As you all know (those of you who have the background for this post, anyway), an R-module is an abelian group M (written additively) together with a multiplication map R\times M\rightarrow M such that for all \alpha,\beta\in R and x,y\in M, \alpha\cdot\left(x+y\right)=\alpha\cdot x+\alpha\cdot y\left(\alpha+\beta\right)\cdot x=\alpha\cdot x+\beta\cdot x, \left(\alpha\beta\right)\cdot x=\alpha\cdot\left(\beta\cdot x\right), and 1\cdot x=x.

What if we don't want to restrict attention to abelian groups? One could attempt to define a nonabelian module using the same axioms, but without the restriction that the group be abelian. As it is customary to write groups multiplicatively if they are not assumed to be abelian, we will do that, and the map R\times M\rightarrow M will be written as exponentiation (since exponents are written on the right, I'll follow the definition of right-modules, rather than left-modules). The axioms become: for all \alpha,\beta\in R and x,y\in M, \left(xy\right)^{\alpha}=x^{\alpha}y^{\alpha}x^{\alpha+\beta}=x^{\alpha}x^{\beta}, x^{\left(\alpha\beta\right)}=\left(x^{\alpha}\right)^{\beta}, and x^{1}=x.

What has changed? Absolutely nothing, as it turns out. The first axiom says again that M is abelian, because yx=x^{-1}\left(xy\right)^{2}y^{-1}=x^{-1}\left(x^{2}y^{2}\right)y^{-1}=xy. We'll have to get rid of that axiom. Our new definition, which it seems to me captures the essence of a module except for abelianness:

A nonabelian R-module is a group M (written multiplicatively) together with a scalar exponentiation map R\times M\rightarrow M such that for all \alpha,\beta\in R and x\in M, x^{1}=xx^{\alpha+\beta}=x^{\alpha}x^{\beta}, and x^{\left(\alpha\beta\right)}=\left(x^{\alpha}\right)^{\beta}.

These imply that x^{0}=1, 1^{\alpha}=1, and x^{-1} is the inverse of x, because x\cdot x^{0}=x^{1}x^{0}=x^{1+0}=x^{1}=x1^{\alpha}=\left(1^{0}\right)^{\alpha}=1^{\left(0\alpha\right)}=1^{0}=1, and x\cdot x^{-1}=x^{1-1}=1.

Just like a \mathbb{Z}-module is just an abelian group, a nonabelian \mathbb{Z}-module is just a group. Just like a \mathbb{Z}/n\mathbb{Z}-module is an abelian group whose exponent divides n, a nonabelian \mathbb{Z}/n\mathbb{Z}-module is a group whose exponent divides n.

Exponentiation-like families of operations

Perhaps a bit more revealing is what nonabelian modules over free rings look like, since then the generators are completely generic ring elements. Where A is the generating set, a \mathbb{Z}\left\langle A\right\rangle -module is an abelian group together with endomorphisms \left\{ x\mapsto\alpha x\mid\alpha\in A\right\} , which tells us that modules are about endomorphisms of an abelian group indexed by the elements of a ring. Nonabelian modules are certainly not about endomorphisms. After all, in a nonabelian group, the map x\mapsto x^{2} is not an endomorphism. I will call the things that nonabelian modules are about "exponentiation-like families of operations'', and give four equivalent definitions, in roughly increasing order of concreteness and decreasing order of elegance. Definition 2 uses basic model theory, so skip it if that scares you. Definition 3 is the "for dummies'' version of definition 2.

Definition 0: Let G be a group, and let A be a family of functions from G to G (not necessarily endomorphisms). If G can be made into a nonabelian \mathbb{Z}\left\langle A\right\rangle -module such that x^{\alpha}=\alpha\left(x\right) for x\in G and \alpha\in A, then A is called an exponentiation-like family of operations on G. If so, the nonabelian \mathbb{Z}\left\langle A\right\rangle -module structure on G with that property is unique, so define x^{p} to be its value according to that structure, for p\in\mathbb{Z}\left\langle A\right\rangle  and x\in G.

Definition 1: A is an exponentiation-like family of operations on G if for all x\in G, the smallest subgroup containing x which is closed under actions by elements of A (which I will call \overline{\left\{ x\right\} }) is abelian, and the elements of A restrict to endomorphisms of it. Using the universal property of \mathbb{Z}\left\langle A\right\rangle , this induces a homomorphism \mathbb{Z}\left\langle A\right\rangle \rightarrow\text{End}\left(\overline{\left\{ x\right\} }\right)^{\text{op}}. Let x^{p} denote the action of p on x under that map, for p\in\mathbb{Z}\left\langle A\right\rangle . By \text{End}\left(\overline{\left\{ x\right\} }\right)^{\text{op}}, I mean the endomorphism ring of \overline{\left\{ x\right\} } with composition running in the opposite direction (i.e., the multiplication operation given by \left(f,g\right)\mapsto g\circ f). This is because of the convention that nonabelian modules are written as nonabelian right-modules by default.

Definition 2: Let consider the language \mathcal{L}_{Rings}\sqcup A, where \mathcal{L}_{Rings}:=\left\{ 0,1,+,-,\cdot\right\} is the language of rings, and each element of A is used as a constant symbol. Closed terms in \mathcal{L}_{Rings}\sqcup A act as functions from G to G, with the action of t written as x\mapsto x^{t}, defined inductively as: x^{0}:=1, x^{1}:=x, x^{\alpha}:=\alpha\left(x\right) for \alpha\in X, x^{t+s}:=x^{t}x^{s}, x^{-t}:=\left(x^{t}\right)^{-1}, and x^{ts}:=\left(x^{t}\right)^{s} for closed \mathcal{L}_{Rings}\sqcup A-terms t and s. A is called an exponentiation-like family of operations on G if x^{t}=x^{s} whenever T_{Rings}\models t=s, where T_{Rings} is the theory of rings. If A is an exponentiation-like family of operations on G and p\in\mathbb{Z}\left\langle A\right\rangle  is a noncommutative polynomial with variables in A, then for x\in Gx^{p} is defined to be x^{t} where t is any term representing p.

Definition 3: Pick a total order on the free monoid on A (e.g. by ordering A and then using the lexicographic order). The order you use won't matter. Given x\in G and w:=\alpha_{1}...\alpha_{n} in the free monoid on A, let x^{w}=\alpha_{n}\left(...\alpha_{1}\left(x\right)\right). Where p\in\mathbb{Z}\left\langle A\right\rangle is a noncommutative polynomial, p=c_{1}w_{1}+...+c_{n}w_{n} for some c_{1},...,c_{n}\in\mathbb{Z} and decreasing sequence w_{1},...,w_{n} of noncommutative monomials (elements of the free monoid on A). Let x^{p}=\left(x^{c_{1}}\right)^{w_{1}}...\left(x^{c_{n}}\right)^{w_{n}}A is called an exponentiation-like family of operations on G if for every x\in G and p,q\in\mathbb{Z}\left\langle A\right\rangle x^{pq}=\left(x^{p}\right)^{q} and x^{p+q}=x^{p}x^{q}.

These four definitions of exponentiation-like family are equivalent, and for exponentiation-like families, their definitions of exponentiation by a noncommutative polynomial are equivalent.

Facts: \emptyset is an exponentiation-like family of operations on G. If A is an exponentiation-like family of operations on G and B\subseteq A, then so is B. If G is abelian, then \text{End}\left(G\right) is exponentiation-like. Given a nonabelian R-module structure on G, the actions of the elements of R on G form an exponentiation-like family. In particular, if A is an exponentiation-like family of operations on G, then so is \mathbb{Z}\left\langle A\right\rangle , with the actions being defined as above.

[The following paragraph has been edited since this comment.]

For an abelian group A, the endomorphisms of A form a ring \text{End}\left(A\right), and an R-module structure on A is simply a homomorphism R\rightarrow\text{End}\left(A\right). Can we say a similar thing about exponentiation-like families of operations of G? Let \text{Exp}\left(G\right) be the set of all functions G\rightarrow G (as sets). Given \alpha,\beta\in\text{Exp}\left(G\right), let multiplication be given by composition: x^{\left(\alpha\beta\right)}=\left(x^{\alpha}\right)^{\beta}, addition be given by x^{\alpha+\beta}=x^{\alpha}x^{\beta}, negation be given by x^{-\alpha}=\left(x^{\alpha}\right)^{-1}, and 0 and 1 be given by x^{0}=1 and x^{1}=x. This makes \text{Exp}\left(G\right) into a near-ring. A nonabelian R-module structure on G is a homomorphism R\rightarrow\text{Exp}\left(G\right), and a set of operations on G is an exponentiation-like family of operations on G if and only if it is contained in a ring which is contained in \text{Exp}\left(G\right).

Some aimless rambling

What are some interesting examples of nonabelian modules that are not abelian? (That might sound redundant, but "nonabelian module'' means that the requirement of abelianness has been removed, not that a requirement of nonabelianness has been imposed. Perhaps I should come up with better terminology. To make matters worse, since the requirement that got removed is actually stronger than abelianness, there are nonabelian modules that are abelian and not modules. For instance, consider the nonabelian \mathbb{Z}\left[\alpha\right]-module whose underlying set is the Klein four group (generated by two elements a,b) such that a^{\alpha}=a, b^{\alpha}=b, and \left(ab\right)^{\alpha}=1.)

In particular, what do free nonabelian modules look like? The free nonabelian \mathbb{Z}-modules are, of course, free groups. The free nonabelian \mathbb{Z}/n\mathbb{Z}-modules have been studied in combinatorial group theory; they're called Burnside groups. (Fun but tangential fact: not all Burnside groups are finite (the Burnside problem), but despite this, the category of finite nonabelian \mathbb{Z}/n\mathbb{Z}-modules has free objects on any finite generating set, called Restricted Burnside groups.)

The free nonabelian \mathbb{Z}\left[\alpha\right]-modules are monstrosities. They can be constructed in the usual way of constructing free objects in a variety of algebraic structures, but that construction seems not to be very enlightening about their structure. So I'll give a somewhat more direct construction of the free nonabelian \mathbb{Z}\left[\alpha\right]-module on d generators, which may also not be that enlightening, and which is only suspected to be correct. Define an increasing sequence of groups G_{n}, and functions \alpha_{n}:G_{n}\rightarrow G_{n+1}, as follows: G_{0} is the free group on d generators. Given G_{n}, and given a subgroup X\leq G_{n}, let the top-degree portion of X be \alpha_{n-1}^{k}\left(X\right) for the largest k such that this is nontrivial. Let H_{n} be the free product of the top-degree portions of maximal abelian subgroups of G_{n}. Let G_{n+1} be the free product of G_{n} with H_{n} modulo commutativity of the maximal abelian subgroups of G_{n} with the images of their top-degree portions in H_{n}. Given a maximal abelian subgroup X\leq G_{n}, let \alpha_{n}\restriction_{X} be the homomorphism extending \alpha_{n-1}\restriction_{X\cap G_{n-1}} which sends the top-degree portion identically onto its image in H_{n}. Since every non-identity element of G_{n} is in a unique maximal abelian subgroup, this defines \alpha_{n}. G:=\bigcup_{n}G_{n} with \alpha:=\bigcup_{n}\alpha_{n} is the free nonabelian \mathbb{Z}\left[\alpha\right]-module on d generators. If A is a set, the free nonabelian \mathbb{Z}\left\langle A\right\rangle -modules can be constructed similarly, with \left|A\right| copies of H_{n} at each step. Are these constructions even correct? Are there nicer ones?

A nonabelian \mathbb{Z}\left[\frac{1}{2}\right]-module would be a group with a formal square root operation. As an example, any group of odd exponent n can be made into a \mathbb{Z}\left[\frac{1}{2}\right]-module in a canonical way by letting x^{\frac{1}{2}}=x^{\frac{n+1}{2}}. More generally, any group of finite exponent n can be made into a \mathbb{Z}\left[\left\{ p^{-1}|p\nmid n\right\} \right]-module in a similar fashion. Are there any more nice examples of nonabelian modules over localizations of \mathbb{Z}?

In particular, a nonabelian \mathbb{Q}-module would be a group with formal nth root operations for all n. What are some nonabelian examples of these? Note that nonabelian \mathbb{Q}-modules cannot have any torsion, for suppose x^{n}=1 for some n\neq0. Then x=\left(x^{n}\right)^{\frac{1}{n}}=1^{\frac{1}{n}}=1. More generally, nonabelian modules cannot have any n-torsion (meaning x^{n}=1\implies x=1) for any n which is invertible in the scalar ring.

The free nonabelian \mathbb{Z}\left[\frac{1}{m}\right]-modules can be constructed similarly to the construction of free nonabelian \mathbb{Z}\left[\alpha\right]-modules above, except that when constructing G_{n+1} from G_{n} and H_{n}, we also mod out by elements of G_{n} being equal to the mth powers of their images in H_{n}. Using the fact that \mathbb{Q}\cong\mathbb{Z}\left\langle \left\{ p^{-1}|\text{primes }p\right\} \right\rangle , this lets us modify the construction of free nonabelian \mathbb{Z}\left\langle A\right\rangle -modules to give us a construction of free nonabelian \mathbb{Q}-modules. Again, is there a nicer way to do it?

Topological nonabelian modules

It is also interesting to consider topological nonabelian modules over topological rings; that is, nonabelian modules endowed with a topology such that the group operation and scalar exponentiation are continuous. A module over a topological ring has a canonical finest topology on it, and the same remains true for nonabelian modules. For finite-dimensional real vector spaces, this is the only topology. Does the same remain true for finitely-generated nonabelian \mathbb{R}-modules? Finite-dimensional real vector spaces are complete, and topological nonabelian modules are, in particular, topological groups, and can thus be made into uniform spaces, so the notion of completeness still makes sense, but I think some finitely-generated nonabelian \mathbb{R}-modules are not complete.

A topological nonabelian \mathbb{R}-module is a sort of Lie group-like object. One might try constructing a Lie algebra for a complete nonabelian \mathbb{R}-module M by letting the underlying set be M, and defining x+y=\lim_{\varepsilon\rightarrow0}\left(x^{\varepsilon}y^{\varepsilon}\right)^{\left(\varepsilon^{-1}\right)} and \left[x,y\right]=\lim_{\varepsilon\rightarrow0}\left(x^{\varepsilon}y^{\varepsilon}x^{-\varepsilon}y^{-\varepsilon}\right)^{\left(\varepsilon^{-2}\right)}. One might try putting a differential structure on M such that this is the Lie algebra of left-invariant derivations. Does this or something like it work?

A Lie group is a nonabelian \mathbb{R}-module if and only if its exponential map is a bijection between it and its Lie algebra. In this case, scalar exponentiation is closely related to the exponential map by a compelling formula: x^{\alpha}=\exp\left(\alpha\exp^{-1}\left(x\right)\right). As an example, the continuous Heisenberg group is a nonabelian \mathbb{R}-module which is not abelian. This observation actually suggests a nice class of examples of nonabelian modules without a topology: given a commutative ring R, the Heisenberg group over R is a nonabelian R-module.

The Heisenberg group of dimension 2n+1 over a commutative ring R has underlying set R^{n}\times R^{n}\times R^{1}, with the group operation given by \left(\boldsymbol{a}_{1},\boldsymbol{b}_{1},c_{1}\right)*\left(\boldsymbol{a}_{2},\boldsymbol{b}_{2},c_{2}\right):=\left(\boldsymbol{a}_{1}+\boldsymbol{a}_{2},\boldsymbol{b}_{1}+\boldsymbol{b}_{2},c_{1}+c_{2}+\boldsymbol{a}_{1}\cdot\boldsymbol{b}_{2}-\boldsymbol{a}_{2}\cdot\boldsymbol{b}_{1}\right). The continuous Heisenberg group means the Heisenberg group over \mathbb{R}. Scalar exponentiation on a Heisenberg group is just given by scalar multiplication: \left(\boldsymbol{a},\boldsymbol{b},c\right)^{\alpha}:=\left(\alpha\boldsymbol{a},\alpha\boldsymbol{b},\alpha c\right).

Advantages of Stochocracy

In the recent UK general election, the proportion of the popular vote won by each party looked like this:

general-election-result-2

The proportion of the seats in the House of Commons won by each party looked like this:

general-election-results-2015-seats

You may notice that these two graphs look pretty different. The Conservative Party won a majority of the seats with only 37% of the vote. The Scottish National Party, due to its geographic concentration in Scotland, got a share of the seats in Parliament nearly double its share of the popular vote. In contrast, the Liberal Democratic Party got only 8 out of the 650 seats despite winning 8% of the popular vote, the Green Party got only 1 seat with 4% of the vote, and, most egregiously, the UK Independence Party got only 1 seat with 12% of the vote. While I have no objections to UKIP getting cheated out of political power, this does not seem like a fair and democratic outcome, and yet this sort of thing is an inevitable consequence of first-past-the-post elections for single-member legislative districts.

This sort of thing happens even with two party systems like in the US, where not only do third parties get completely shut out, the balance of power between the two major parties is also skewed: in 2012, the Republicans won a majority of seats in the House of Representatives, despite the Democrats winning more total votes. This has been widely attributed to Gerrymandering, but the fact that Democratic votes are more geographically concentrated than Republican votes contributes even more.

Proportional representation is an easy solution to this problem. One of the criticisms of proportional representation is that having representatives associated with a district ties them closer to the voters. There are variants of proportional representation that address this, but here I want to propose another one. But first, let's talk about random sampling.

Randomized Vote-Counting

If instead of counting every vote in an election, we randomly sampled some small fraction of the ballots and counted those, we would get the same result almost every time, with discrepancies being statistically possible only when the vote is very close (which seems fine to me; getting 50.1% of the vote does not seem to me to confer much more legitimacy than getting 49.9% of the vote does). While this might make elections slightly cheaper to administer, it would be a massive under-use of the power of randomization.

In California (and also, I am under the impression, in most other U.S. states, and many other countries), elections tend to include a massive profusion of state and local officials and ballot measures. Each of these individually requires a significant amount of research, so it is prohibitively time-consuming to adequately research every ballot question. When I vote, I usually feel like I did not have time to research most of the ballot questions enough to make an informed vote, and yet I suspect I still spend more time researching them than average.

My proposed solution to this is for each voter to be randomly assigned one ballot question that they get to vote on. When you only have one issue you can vote on, it is much easier to thoroughly research it. Thus this system should result in more informed voters.

People would likely object that this system is undemocratic, since not everyone gets to vote on each ballot question. But in fact, this system would probably end up being more democratic than the current one, since it would make voting easier and thus probably increase turnout, making the sample of people voting on each issue more representative of the electorate as a whole, even while comprising a smaller fraction of it. Some people might not vote because of being assigned to an issue that they are not interested in, but most such people probably wouldn't have voted on that issue anyway; I'd bet there would be significantly more people who would vote on an issue if it was the only one they could vote on, but wouldn't vote on it if they could vote on all of them. Furthermore, since people would have more time to research their ballot question, they would be less reliant on information shoved in their faces in the form of advertising, so this would decrease the influence of special interest groups in politics, arguably also making the system more democratic.

What about just one?

So far I've been suggesting that enough votes should be sampled that the outcome of each election is virtually guaranteed to be the same as it would be if all the votes were counted, with all the exceptions being when the vote is very close. But what happens when we don't sample enough votes for that? What if we take it all the way to the extreme and only sample one vote? This would not be appropriate for ballot measures or executive officials, but for electing the members of a legislature with a large number of single-member districts, this actually has some pretty nice properties.

Since each ballot is equally likely to be the one that gets counted, the probability of each candidate getting elected is proportional to the number of votes they get. Averaged over a large number of districts, this means that the number of legislators elected from each political party will be approximately proportional to the popular support for that party. Thus, this simulates proportional representation with single-member electoral districts.

This is very similar to a sortition, in which legislatures are a random sample of the population. The primary difference is that in a sortition, many of the people randomly selected to be legislators may have little interest in or ability for the job. However, in this system, someone would still have to demonstrate interest by running for election in order to be selected. To further discourage frivolous campaigns, costs could be imposed on the candidates, for instance by requiring them to gather signatures to qualify for the ballot, to ensure that no one gets elected who isn't serious about their intent to serve in the legislature.

A small further advantage of this system over a sortition is that it ensures that the legislators are evenly distributed geographically, so the variance of the number of seats won by each political coalition would be slightly smaller than it would be under completely random sampling.

Another advantage of my randomized system over first-past-the-post and proportional representation is that it avoids electoral paradoxes that plague deterministic systems. It avoids the Alabama and population paradoxes, which proportional representation is vulnerable to. There is also no incentive for tactical voting, since if your vote gets selected, the others do not constrain which candidates you can get elected. And there is no incentive for Gerrymandering, since the expected number of seats won by a party will be proportional to its vote count no matter how the districts are drawn, provided they all have equal numbers of voters.

A possible objection to this system is that candidates can get elected with support from only a small fraction of their constituents. But this does not seem that bad to me. Even under first-past-the-post, it is the norm for a large fraction of the constituents to vote against the winning candidate. Even in safe seats, the fraction of voters who vote against the winner is typically fairly significant (e.g., a third), and these voters never get the chance to be represented by their preferred candidate. Under the randomized system, any significant local coalition would get a chance of representation sometimes, and dominant coalitions would be represented most of the time. And if a party is dominant in a district, then even if the representative for that district ends up not being aligned with that party, there will likely be nearby districts that are represented by that party. For example, the San Francisco Bay Area is so dominated by Democrats that all of its members of the House of Representatives and the state legislature are Democrats, leaving the Bay Area Republicans unrepresented. Using the randomized system, a few Republicans would get elected in the Bay Area, so the constituency of Bay Area Republicans get their representation, and the Democrats in the districts that end up getting represented by Republicans would still have plenty of Democratic legislators in neighboring districts to represent their interests.

One significant disadvantage is that it would be difficult for legislators to accumulate much experience in the legislature, since they would have a significant chance of losing each re-election even if they have broad support in their district. Primarily for this reason, I think this randomized system is inferior to single transferable vote and party list proportional representation. But despite this, I still think it is not too terrible, and would be a significant improvement over the current system. Sometimes you can make the system more democratic by counting fewer votes.