The Turing Test

By Lynellen D.S.Perry
September 21, 1995

In 1963, Alan Turing wrote about a test that would "refute anyone who doubts that a computer can really think: if an observer cannont distinguish the responses of a programmed machine from those of a human being, the machine is said to have passed the Turing test" (Gardner 1987). In 1991, the first Turing test competition was conducted by the Computer Museum in Boston. To make this test practical, the programs had to respond to only a small subset of topics, including women's clothing, Burgundy wine, and romantic relationships (Casti 1994). But the restriction of topics also takes the teeth out the test, for it is much easier to converse about a restricted range of topics than to demonstrate general conversational ability.

The controversy around the Turning test is that it doesn't seem to be very general and it defines intelligence purely in terms of behavior. I think that the Turing test is not an adequate test of intelligence. Conversation is not the ultimate display of intelligence, and real thinking is not indicated by spitting out sentences, because that is all the computer is programmed to do.

In some ways, this test reminds me of the stories about the intelligence tests that immigrants had to take when arriving at Ellis Island at the turn of the century. The Ellis Island tests were culturally dependent (like identifying the correct sequence of pictures that depict an event, or identifying smiling or sad faces) just like today's intelligence tests have a large cultural component. Even worse, a Turing test with a restricted domain is like an intelligence test which only accurately measures white males who watched a lot of television in 1970.

The turing test focuses too much on the behavior of conversation. Just because I am speaking to a fellow student who learned their lousy Pig-Latin in elementary school does not mean that the student is not intelligent. A computer program that sometimes miscues and says something that makes no sense does not necessarily mean its not intelligent. The other side of the coin occurs when a person who has a very extensive knowledge in a small domain can seem to be a computer due to this intricate knowledge. This doesn't necessarily imply intelligence either, since it says nothing about the person's ability to learn, handle a new situation, or to merely converse about some other subject. Conversational skills are not the ultimate sign of intelligence, even in a world where communication media are so pervasive.

Casti (1994) writes about philospher Ned Block's argument against the Turing test. If we were to "write down a tree structure in which every possible conversation of less than five hours' duration is explicitly mapped out" (Casti 1994) and then have the computer follow this tree during the conversation, it would have all the appearances of being intelligent. But this "strongly suggests that the machine has no mental states at all ...Intelligence is not just the ability to answer questions in a manner indistinguishable from that of an intelligent person; to call a behavior intelligent is to make a statement about how that behavior is produced" (Casti 1994). Following a preset structure is not intelligent, but being able to adapt dynamically in real time is intelligent.

Since behavior alone is not a test of intelligence, what exactly is intelligence? How can it be noticed or observed? This alone is a large debate in the Artificial Intelligence community, just as it is debated in the psychology community and the philosophy community. Henley (1991) argues that most AI applications under developmnet today are "pragmatic" in their definition of intelligence. "If it functions in the same capacity as an intelligent, indeed expert, human bein then that is that. Intelligent in these situations is defined in the practical terms of cost/benefits and bottom line performances" (Henley 1991). This may seem to be a problem until he points ou that neither philosophy nor psychology have reached a consensus about the definition of intelligence. So perhaps a pragmatic definition is not so horrible.

Newell and Simon say that intelligence will involve "the use and manipulation of various symbol systems, such as those featured in mathematics or logic" (Gardner 1987). Others suggest intelligence includes such things as "feelings, creativity, personality, freedom, intuition, morality, and so on" (Haugeland 1985). Before we can definitively test for the presence of intelligence, we must arrive at a consensus about its definition.

Another factor to look at in this quest is the purpose of building artificial intelligences. Are we trying to simulate human minds in order to experiment with hypothesis' about how they work? Or are we interested solely in the end result? If we are only interested in the consequences of a program's execution, its output, then perhaps the Turing test is applicable. In this type of situation, it doesn't matter how the program arrived at the response, but merely the fact that the output matched, to some degree, the output that would be expected from a human. The appearance of intelligence could be sustained by a program that had merely a large enough database of pre-programmed resonses (or a fast method of generating a response) and a good pattern recognizer that could trigger the appropriate output.

But if we care at all how the output was produced, then this type of simple program will not demonstrate "intelligence". Rather, we would say that its output was not the result of any real thinking but merely a set of rules for matching input to output. This reminds me of Searle's Chinese room. The output appears to be intelligent because it correctly translates the input, but there is no real understanding or intention or purpose behind the output. The output is generated by blindly following a set of rules.

I think that the process of arriving at a result is part of intelligence. Intelligence partly involves sensing the environment, processing that information, and acting to make changes. An intelligent program must be able to handle previously un-encountered situations, be able to operate effectively with inaccurate and incomplete knowledge, be able to learn from past experience, have access to a large amount of common sense knowledge, and be able to predict possible outcomes and prepare for them. A conversational test can not adequately test these requirements. Probably we can not test adequately for intelligence until we have learned to mate machine intelligence with robotics so that the collection can manipulate a physical environment in order to test its hypothesises.

It seems that this sort of test, though not conversational, is again focused on behavior. It may be that the only way we can measure human intelligence is by evaluating the actions taken in various situations. This is because we do not have access to the inner states of the human mind, so we don't completely understand what sequence of states would constitute intelligence while any other sequence would indicate a lack of it. However, we could have access to the states a program goes through. But just monitoring these leads to a dangerous assumption. If we did know the exact way a human mind works, and the computer program did not follow the same method (but got the same results), then some might say that the program wasn't intelligent because it didn't do things the way humans do. So we have to ask if the goal of artificial intelligence is to exactly model the way a human does things, or is the goal to solve the same types of problems that humans solve (or problems that humans have a hard time with).

In either case, the Turing test is not a good test for intelligence. If the computer performs bettern than the human and thus gives iteself away, then it has technically failed the test. If a judge is comparing the problem solving methods of the human and the computer and they don't match, then the computer fails again even if it is able to solve the problem when the human was not. While the Turing test is not adequate, it does seem to me that we will end up with some variety of behavioral test when looking for intelligent programs. In the mantime, we still have a ways to go on getting everyone to agree to a definition of intelligence.

References

Casti, John L. 1994. Complexification Harper Collins Publishers.

Gardner, Howard, 1987. The mind's new science. Basic Books.

Haugeland, J. 1985. Artificial intelligence: the very idea. Cambridge, Mass: MIT Press.

Henley, Tracy B. 1991. Natural problems and artificial intelligence. Behavior and Philosophy 18:2, p43-55.