The great challenges of our time, San Patrignano 2015
I was interviewed earlier today by Richard Adhikari, a journalist at TechNewsWorld, about an Artificial Intelligence project. I didn’t know anything about that project except what would be the title of the article: “AI Program Thinks Like a 4-Year-Old”.
There is an excellent summary of what I had told the journalist:
“I’m always suspicious of this kind of thing where they’re dealing with children,” anthropologist and sociologist Paul Jorion told TechNewsWorld. “I always have the feeling that there are some major issues they haven’t been able to solve yet.”
Jorion developed ANELLA, the Associative Network with Emerging Logical and Learning Abilities, whose intelligence was guided by the dynamics of affect, or feeling, back in 1989 for the artificial intelligence unit of British Telecom.
Most of the approaches toward AI “have taken an over-sophisticated view of the problem,” Jorion said. His, on the other hand, was “very simple — I’ve got a universe of words, and you just find a way to connect them that makes sense.”
Now talking about Eddie, the four year old toddler developed at the Rensselaer Polytechnic Institute in New York State by a team lead by Selmer Bringsjord, the article explains that
To test Eddie’s reasoning powers, the group created a demo in Second Life in which Eddie was shown someone placing an object in one location then leaving the virtual room, followed by a second person who moved the object to another location in the room. Eddie was then asked where the first person would look for the object when he got back. Eddie’s response was the first location — incorrect, but typical of a four-year-old child in the real world.
Hmm, what did I tell you!
Thought as word dynamics. II. Architecture (5)
Mathematically speaking, a graph is a set of ordered pairs. It can be decomposed in elementary units of individual pairs. Each of the two elements in an ordered pair represents a node in the graph and every pair represents an edge. Say, there are four nodes, A, B, C and D and the graph is defined as (a, b), (b, c), (c, d) and (a, c), the corresponding graph is as represented in the figure below.
An example of a word-pair is “cat-feline” and each of the two words “cat” and “feline,” can be part of more than one of such pairs: “feline” may be associated again, this time with “mammal,” and “cat” with “whiskers,” etc. The individual units of the network we’re talking about here are two nodes associated and standing for “word-pairs.”
The origin of the medieval notion of the “categoreme” introduced in Section 4 rests in Aristotle’s short treatise called “Categories” devoted to those words that can act as either a subject or a predicate in a sentence. “Blue” is predicated of the subject “violets” when I say that “violets are blue.” Color” is predicated of “blue,” the subject, when I say that “Blue is a color.” It is clear that the words so distinguished as being able to act either as subject or as predicate amount to those I called in Section 4 “content-words.” Why call them then “categoremes”? Because, Aristotle argues, they can be used in ten different ways, acting in ten different functions, being also the ten standpoints from which “stuffs” can be envisaged. These are the “various meanings of being” which he calls the categories.
Here is his explanation of this in his own terms: “Expressions which are in no way composite signify substance, quantity, quality, relation, place, time, position, state, action, or affection. To sketch my meaning roughly, examples of substance are ‘man’ or ‘the horse’, of quantity, such terms as ‘two cubits long’ or ‘three cubits long’, of quality, such attributes as ‘white’, ‘grammatical’. ‘Double’, ‘half’, ‘greater’, fall under the category of relation; ‘at the market place’, ‘in the Lyceum’, under that of place; ‘yesterday’, ‘last year’, under that of time. ‘Lying’, ‘sitting’, are terms indicating position, ‘shod’, ‘armed’, state; ‘to lance’, ‘to cauterize’, action; ‘to be lanced’, ‘to be cauterized’, affection. No one of these terms, in and by itself, involves an affirmation; it is by the combination of such terms that positive or negative statements arise. For every assertion must, as is admitted, be either true or false, whereas expressions which are not in any way composite such as ‘man’, ‘white’, ‘runs’, ‘wins’, cannot be either true or false” (Aristotle, Categories, IV).
The final words in the passage just quoted are most important: isolated terms, terms taken on their own cannot be regarded as either true or false: “it is by the combination of such terms that positive or negative statements arise.” It is possible to even go one step further: does a term in isolation mean anything? “Of course” is one tempted to say, indeed, as I said earlier, we’re at no loss when asked to define a term like “rose.” I gave as an example of doing precisely that: “a rose is a flower that has many petals, often pink, a strong and very pleasant fragrance, a thorny stem.” We spontaneously assigned to the rose the category of substance, of being a flower; we assigned quantity to its petals for being many; we attributed the quality of being pink to its petals, etc. In other words, we brought the rose out of its isolation by connecting it with other words in sentences of which, as Aristotle observed, it will then be possible to say if they are true or false.
Out of the examples the Greek philosopher mentions, it is self-evident that “double,” “half,” “greater,” “two cubits long,” “lying,” “sitting,” “shod,” “armed,” “runs,” “wins” have no meaning unless they are said – predicated – of something else. But with a moment of reflection it becomes obvious that this applies to the other words too: “man,” “horse,” “white.” As we’ve just seen when examining what is called the definition of a rose, these words also need to be said of something to come alive. In a passage of one of his dialogues, The Sophist, Plato has the Stranger from Elea (*) make an identical point: “The Stranger: A succession of nouns only is not a sentence, any more than of verbs without nouns. […] a mere succession of nouns or of verbs is not discourse. […] I mean that words like ‘walks,’ ‘runs,’ ‘sleeps,’ or any other words which denote action, however many of them you string together, do not make discourse. […] Or, again, when you say ‘lion,’ ‘stag,’ ‘horse,’ or any other words which denote agent – neither in this way of stringing words together do you attain to discourse; […] When any one says ‘A man learns,’ should you not call this the simplest and least of sentences? […] And he not only names, but he does something, by connecting verbs with nouns; and therefore we say that he discourses, and to this connection of words we give the name of discourse” (Plato, The Sophist).
Assuming that there is in the brain a network being the substrate for speech performance, what would then be the “element” – the smallest unit – to be stored in such a network? I hold that it would be the “word-pairs” just described, instead of words in isolation. Synaptic connections seem the perfect locus for such storage: the place where the building blocks of the brain’s biological network, the neurons, come together. Why not the isolated word? Because, as Aristotle saw it sagaciously, “word-pairs” are true or false and, as we will see next, something being true or false is the first condition for it having an affective value, i.e. what brings in motion the dynamics of speech performance.
(*) Griswold notices that – apart from Parmenides – the anonymous stranger is the single figure in all the dialogues who speaks like a full-blown philosopher; he observes also that while Socrates is present in the Sophist he remains almost mute (Charles L. Griswold, “La naissance et la défense de la raison dialogique chez Platon,” in La naissance de la raison en Grèce, Paris: PUF, 1990: p. 365)
Thought as word dynamics. II. Architecture (4)
In Indo-European languages there are two types of words. Every speaker has a very strong intuitive feeling of this. We have no difficulty when defining the meaning – or offering a definition – of words of the first type: “a rose is a flower that has many petals, often pink, a strong and very pleasant fragrance, a thorny stem,” etc.; “a tire is a rubber envelope to a wheel, inflated with air,” etc. With the second type, we’re in real trouble: for instance with the word ‘nonetheless,’ “it is used when one wishes to suggest that while a second idea may – at first sight – look contradictory to one first expressed, it is however the case, etc…” When trying to define a word like “nonetheless” I typically cannot resolve myself to say that it “means” something, I’d rather claim – like I did above – that “it is used when…,” and revealingly I am forced to express this usage through quoting – if not a true synonym of it, at least, as with “however”– a word which is used in very similar contexts.
The first type of words are often called “content words,” the second “framework-” or “structure words” (1). Dictionaries have an easy time with the first and a rotten time with the second, doing like has been done here with “nonetheless”: resorting to the cheap trick of referring to a closely related word, the meaning – the usage – of which the reader is supposedly more familiar with. The English philosopher Gilbert Ryle, interestingly called the first type “topic-committed” and the second “topic-neutral.” He wrote: “We may call English expressions ‘topic-neutral’ if a foreigner who understood them, but only them, could get no clue at all from an English paragraph containing them, what that paragraph was about” (Ryle 1954: 116).
In the technically unambiguous language used by the medieval logicians, the first were called “categoremes” and the second, “syncategoremes.” (2) Intuitively speaking we can understand this as meaning that “content-words” are essentially concerned with telling us what is the category, the “kind,” the “sort” of things we’re talking about; while the second type of words, the “framework-words” are essentially playing a syntactic role, a “mortar” type of role – which would explain why we’re at trouble explaining what they “mean” and feel more comfortable describing how they’re being “used.”
The network I’m talking about is constituted of “content-words”: these are the building blocks of a network where roses connect with red and violets with blue. The other words, the “framework-words” are not part of this particular network, they’re stored in a different manner, they’re summoned to make the “content-words” stick together, as the mortar of a particular kind that will make these words, or such combinations of words, work together within a clause. Like what was mentioned in an attempt to give a definition for “nonetheless”: that it is used when the two states of things which are brought together may seem at first sight to be contradictory. In order to ease the clash, to relieve the affective discomfort that arises when contradictory states-of-affair are brought together, a word like “nonetheless” is pasted between the parties at war. With “nonetheless,” the states-of-affairs evoked come from distant places in meaning-space: bringing them together creates an imbalance that needs to be resolved. The talking subject who’s connecting in his speech the states-of-affairs which are on either side of the “nonetheless,” cringes. So s/he stuffs between them a “contradiction insulator,” a “compatibility patch” like nonetheless. And everything is once again fine. “The Duke knew that his best interest and the Princess’s too was that he wouldn’t try to see her again. Nonetheless, the following morning…” The “nonetheless” relieves my worry, I won’t care for that Duke any longer: if he’s that kind of fool, well, good for him! What do I care!
“Framework-words” are part of what I will call in Section 14 the “coatings”: the coatings that out of the words found in a finite path along the Network create a proper sentence.
(1) Not every language deals with such distribution of “content-” and “framework-words” in a similar way. Languages like Chinese and Japanese are much more sparing in their use of “framework-words” than Indo-European languages are. Archaic Chinese for one had very few of those and meaning was emerging essentially from the bringing together – without further qualification – of “content-words”.
(2) Ernest Moody sums up the issue in the following manner: “The signs and expressions from which propositions can be constructed were divided by the mediaeval logicians into two fundamentally different classes: syncategorematic signs, such as have only logical or syntactic functions in sentences, and categorematic signs (i.e. “terms” in the strict sense) such as have independent meaning and can be subjects or predicates of categorical propositions. We may quote Albert of Saxony’s (1316-1390) definitions of these two classes of signs or of “terms” in the broad sense.
‘A categorematic term is said to be one which, taken significatively can be a subject or predicate (…) of a categorical proposition. For example, those terms ‘man’, ‘animal’, ‘stone’, are called categorematic terms because they have a definite and determinate signification. A syncategorematic term, however, is said to be one which, taken significatively, cannot be the subject or the predicate (…) of a categorical proposition. Of the kind are these terms ‘every’, ‘not any’, ‘some’, etc. which are called signs of universality or particularity; and similarly, signs of negation such as this negative ‘not’, and signs of composition such as this conjunction ‘and’, and disjunctions such as this disjunctive ‘or’, and exclusive or exceptive propositions such as ‘other than’, ‘only’, and words of this sort’ (Logica I). In the 14th century it became customary to call the categorematic terms the matter of propositions, and the syncategorematic signs (as well as the order and arrangement of the constituents of the sentence), the form of propositions (Moody 1953: 16-17)
Moody, E. A. 1953 Truth and consequence in Mediaeval Logic, Amsterdam: North-Holland
Ryle, G., 1954 Dilemmas, The Tarner Lectures 1953, Cambridge: Cambridge University Press
Thought as word dynamics. I. General principles. (3)
A thoroughly “physical” account of the objective dynamics of speech performance will be provided later. In the meantime I’m indicating here that as far as talking subjects are concerned, their subjective experience of the dynamics of speech performance is – from the initiation of a speech act to its conclusion – one of an emotional or “affective” nature. A view commonly held is that emotions hinder the expression of rational thinking. Beyond a certain threshold, emotions may indeed lead to disarray and impair speech performance. In normal circumstances however, the “expression of one’s feelings” – which is the spontaneous way people describe the motive behind their speech acts – results in rational discourse. This is due to the network underlying speech performance being structured: channelling speech performance along ever-branching but constrained paths in such way that the expression of one’s feelings engenders out of necessity one or more series of meaningful sentences.
People claim they speak to “express their feelings,” “to relieve themselves,” “to get that thing out of my system” and such is indeed the subjective experience of speech performance: talking subjects experience a situation ranging from minor to serious dissatisfaction (the causes of which I’ll fully investigate in Section 18) and “talk their heart out” until, having reached the end of a particular outburst of speech, they feel relieved: feeling once again of a “satisfied mind.” Until, that is, some renewed source of minor or major irritation launches the dynamics all over. I will show in Section 15 that from an objective point of view the dynamics is no doubt better described as the reaching of a potential well within a word-space under a minimization dynamics, but it can also justifiably be described as an “affective dynamics,” as in the eye of the talking subject the process is experienced as one of emotional relief. Also, the parameter determining the dynamics of the gradient descent within the word-space is the “affect” value associated with words (actually word-pairs as we’ll see) within the “word-space” that the network amounts to.
Thought as word dynamics. I. General principles. (2)
It is undeniable that speech acts are produced from within a talking subject, as they are uttered through the mouth. This however does not necessarily imply that the network mentioned in Speech acts are generated as the outcome of a dynamics operating on a network, along with its data, is stored within the talking subject. It can however be reasonably inferred that such is indeed the case, essentially for lack of a viable alternative hypothesis.
Suppose for a moment that the network were located elsewhere than within talking subjects, meaning that the substrate to speech acts would lie outside their bodies, whether broadcasting its information or constituting a repository accessed by talking subjects. There should then be circumstances when communication breaks down or is at least impaired because of some physical obstacle interfering with it. Nothing of the sort is observed with speech performance: individuals swimming at the very bottom of the ocean, walking on the moon or prisoners of a lead-coated concrete bunker don’t show any reduction in their capacity for speech.
Quite interestingly, it is a distinctive feature of some mentally disturbed individuals that they postulate the existence of such an outside source for speech acts and claim that their words or their inner speech is being interfered with by an obtrusive sender (*).
Once admitted that the network is indeed located within the talking subject, its likely container has been shown beyond any reasonable doubt to be the brain. Indeed lesions to the brain, being accidental or clinically performed, as well as other types of interference, do impair speech performance in very general or specific ways. There is by now an abundant literature, that the likes of Broca or Wernicke initiated in the nineteenth century, showing what consequences in terms of aphasia or agnosia, i.e. impairments in speech performance, or thinking, of various natures, specific lesions of the brain induce or interfere with its functioning (the works of Oliver Saks in the 1980s: The Man Who Mistook His Wife for a Hat in particular, have popularized such accounts).
It is worthwhile noticing however that such observations, taken in isolation, are insufficient to invalidate the hypothesis of the externality of the network: it could be the case indeed that lesions simply hinder reception from an external source, or impair the brain’s capacity at tapping an outer repository. It is only once admitted as most plausible that the body of the talking subject holds the network (being the substrate for speech performance) that the brain shows to be its most probable location.
Beyond this deductive probability, is there any further plausibility that the brain contains the type of network discussed here? There is indeed: the brain is known to contain a particular network constituted of nerve cells or neurons. In the coming sections it will be my concern to check if the network in question and the one made of nerve cells can possibly be the same.
(*) I will show below (section 21) why it should be expected from a network the connectedness of which is broken that it assumes it cannot be itself the source of the speech performance it utters. With connectedness lost, the disconnected parts of the network have ceased to communicate, they generate speech independently: the emergence of speech acts from another part of the network is perceived as being from an external source by every one of the other parts.
Thought as word dynamics. I. General principles (1)
The general hypothesis is that “speech acts are generated as the outcome of a dynamics operating on a network”; it is specific as far as it states that the data, the “words,” summoned in the generation of speech acts are structured as a network. It is also informative when it distinguishes to the mechanism, two parts: an architecture, being the network itself and a “dynamics” – so far unqualified – operating on it. To a large extent the hypothesis states the obvious as speech performance unfolds in time and is therefore out of necessity a dynamic process; also, any dynamics necessarily operates on a substrate constituting its architecture. In the case of speech performance this architecture automatically comprises the building blocks of speech acts, i.e. the words that get combined sequentially into speech acts.
Also, unless the full complexity of speech performance is assigned to its dynamics, it is reasonable to assume that to some extent it reflects the static organization of the data. There is no hard evidence disallowing the converse view that speech performance results from an extremely complex dynamics operating on unstructured data, sentences being generated through picking individual words on demand from a repository where they are randomly stored. At the same time, this converse hypothesis would suppose a highly uneconomical method for dealing with the task of generating a sequentially organized output. This would be unexpected as it has been observed that as soon as biological processes reach some level of complexity, that complexity is economically distributed between the substrate and the dynamics operating on it (this is the case with the sense organs, for instance, which deal partially with the complexity of information processing through the complexity of the organ itself).
If data (“words”) are to some extent organized within their repository, one obvious avenue for modelling this organization is to represent it by way of the mathematical object known as a graph (a set of ordered pairs). A connected graph (1) is what is being referred to in non-technical terms as a “network.” In other words saying that the dynamics of speech performance operates on a network amounts to saying simply that its substrate of words is “in some way” and “in some degree” organized. Adding that this network is connected amounts to saying that the full lexicon known to the speaker is available whenever a single clause is generated (2).
(1) I’ll show further down (section 21) that the connectedness of the graph is a condition for the rationality of the speech acts uttered by a talking subject).
(2) As will be postulated below (section 21), what happens with psychosis is that only part of the lexicon is available at any one time for speech performance. Neurosis (section 20) would correspond to the less dramatic circumstances when individual words and therefore particular paths in the network are inaccessible, the whole lexicon remaining otherwise accessible, even if sometimes only through convoluted and cumbersome ways.
On two occasions already on my French blog, I’ve written an article in a serialized format, posting each part whenever it was ready then creating a link to a copy of the whole text when it had reached a final form. In both instances has the process taken about two months (Les tâches et les responsabilités qui sont aujourd’hui les nôtres ; Ce qu’il est raisonnable de comprendre et partant d’expliquer).
I will proceed in the same way with Thought as Word Dynamics. I envision that the process will take longer as the text is structured at inception as having twenty-five chapters. I will try to post simultaneously an English and a French version.
I worked full-time as an Artificial Intelligence researcher from 1988 to early 1990. My final report for British Telecom (Martlesham Heath – U.K.) is entitled An alternative neural network representation for conceptual knowledge. My work at the Laboratoire d’Informatique pour les Sciences de l’Homme (Paris) led to a book entitled Principes des systèmes intelligents (Paris: Masson, 1990).
Thought as Word Dynamics
The model presented here has been built over a number of years from several angles, combining theoretical knowledge with feedback obtained from implementing it as a piece of software. I regard philosophy, a twenty-five century speculative pursuit by the best minds of every period, as a legitimate source of knowledge on cognition. Some other – and possible unlikely – sources have shown to be of essential benefit for both this study and my previous work in Artificial Intelligence: Freudian psychoanalysis, mediaeval contributions to logic and the work of the ancient Chinese logicians.
The ambition here is to provide a framework for speech acts, being specific enough about both its architecture and its dynamics to be testable as an Artificial Intelligence application. The test began several years back when, being part of British Telecom’s “Connex” Project, I designed ANELLA as an “Associative Network with Emergent Logic and Learning Abilities.”
I. General principles
1. Speech acts are generated as the outcome of a dynamics operating on a network
2. The network in question is stored in the human brain
3. A talking subject experiences the dynamics of speech generation as emotional or “affective”
4. The network comprises a subset of the words (the “content words”) of a particular natural language
5. The individual unit in the network as far as speech generation is concerned is a word-pair
6. Each such word-pair has at any time an affect value attached to it
7. The affect value of the word-pairs results from Hebbian reinforcement
8. The network has two principles of organization: hereditary and endogenous
9. The hereditary principle is isomorphic to the mathematical object called a “Galois Lattice”
10. The endogenous principle is isomorphic to the mathematical object called a “P-graph”
11. The endogenous principle is primary
12. The hereditary principle is historical: it allows syllogistic reasoning and amounts to the emergence of “reason” in history
13. The skeleton of each speech act is a path of finite length in the network
14. A speech act is the outcome of several “coatings” on a path in the network
15. The generation of a speech act is a gradient descent in the phase space of the network when submitted to an affect dynamics
16. The utterance of a speech act modifies the affect values of the word-pairs activated in the act
17. The gradient descent (relaxation) restores an equilibrium in the network
18. Imbalance in the affect values attached to the network has four possible sources
1. Bodily processes experienced by the speaking subject as “moods”
2. Speech acts of an external origin, heard by the speaking subject
3. Speech acts of an internal origin: thought processes as “inner speech” or hearing oneself speak (being a sub-case of 2.)
4. Empirical experience (perception)
19. In the healthy subject, each path has inherent logical validity; this is a consequence of the topology of the network
20. Neurosis results from imbalance of affect values on the network impairing normal flow (Freudian “repression”)
21. Psychosis amounts to defects in the Network’s structure (Lacanian “foreclosure”)
22. Speech generation is automatic and only involves the four sources mentioned above (18)
23. Speech generation is deterministic
24. There is no room for any additional “supra-factor” in speech act generation than the four mentioned above (18)
25. One such superfluous “supra-factor” would be “intentionality,” triggered by consciousness or otherwise
So, Marvin Minsky has just published a new book called The Emotional Machine (Simon & Schuster 2007) where he states that Artificial Intelligence should rest on the observed feature that intelligence is emotional by nature. This of course rings a bell, as some twenty years ago an audacious AI engineer, traveling between Martlesham Heath (Suffolk) and Paris (France), wrote ANELLA (Associative Network with Emergent Logical and Learning Abilities), a piece of software mimicking logical reasoning on a body of knowledge it had constituted through asking questions only. I say “mimicking” as there were no rules of logic in ANELLA; whatever logic could be seen was generated by ANELLA’s affect dynamics and driven by the feedback it was receiving from its users.
The author of ANELLA was your humble servant indeed, having been granted at the time an Academic Fellowship by British Telecom.
In those days, the single psychological school having paid attention to emotion as the driver of human intelligence and of human behavior altogether was of course psychoanalysis. In 1987 I was both writing AI source code and training to become a psychoanalyst: ANELLA allowed me to combine both. In that same year I published in L’Âne, the literary magazine led by Judith Lacan-Miller, “Ce que l’Intelligence Artificielle devra à Freud”: “What Artificial Intelligence will owe Freud”. Three years later my book Principes des systèmes intelligents came out where I described ANELLA’s philosophy and concept. The book has become, to its author’s delight, a minor classic. No doubt that Minsky’s call to arms will mean that flocks of English-speaking publishers will now vie to publish this pioneering work in their native idiom!