Special Section: A Brief History of AI

For thousands of years there has been the notion of a thinking automaton, or machine, such as the bronze man Talos from the Argonautica (3 BC) or the golems in the Talmud. As technology advanced, developing machines that appear intelligent or on their own volition began to appear. By 1764 the Canard Digérateur (digesting duck) was built; the duck  would quack, muddle water with its bill, eat grain pellets and later poop them out (Wood, 2003). The claim that it digested the food was later revealed to be a clever trick, and not actual digestion. Much like the Schachtürke (Mechanical Turk) that purported to be a master chess playing automaton (Levitt, 2000), the Canard Digérateur was an impressive mechanism whose designers intended to fool others about its actual abilities. With a history of deception with ‘thinking machines’, as well as the development of computers, it might seem surprising that Alan Turing (1950) proposed a test to judge machine intelligence on whether it can fool us into thinking it is a person. Nevertheless, by 1950 computers could store and execute commands, albeit it was vastly expensive to do so (Williams, 1997). Technology was enabling computing machinery that could not otherwise be built before.

In 1956 Dartmouth College coined the term “Artificial Intelligence” during a summer research conference organized by John McCarthy. Allen Newell, Herbert Simon, and Cliff Shaw attended and demonstrated Logic Theorist (1956), a program using a virtual problem solver using heuristics to solve mathematical theorems similar to those in Principia Mathematica by Whitehead and Russell (1910). Prior to this point, computers did not use heuristic programming. Logic Theorist is arguably the first AI program. Its success inspired Newell and Simon to develop General Problem Solver (1959) and ushered in a golden era of research in AI.

Inspired by biological models of neural networks by Warren McCullock (1943), John Rosenblatt developed an electronic device with the capability to learn and called it a ‘perceptron’ in 1958. These networks excelled at pattern recognition. However, in 1969 Minsky and Papert critiqued the effectiveness of a two-level network for learning which thwarted its success despite Rosenblatt proposing a multi-level network (more than two). The debate between analytical AI and proponents of neural networks continued for years.

The general success of AI that followed is aptly reflected by a Minsky quote from Life magazine, “From three to eight years we will have a machine with the intelligence of a human being” (Darrach,1970). Research began to hit a wall however, and less money was being invested in AI. The idea of using terms such as ‘thinking’ or ‘intelligence’ became out of style following McDurmott’s criticism of the misuse of these terms in his paper “Artificial Intelligence Meets Natural Stupidity” (1976). As complex problems challenged the technical limits with vast search spaces, researchers focused more on problems with a well-defined scope and referred to the work as ‘applied artificial intelligence’.

A shift in AI programming occurred to focus on knowledge gathered from experts.  Edward Feigenbaum’s work developed some of the first ‘expert systems’ that proved successful at tasks such as identifying compounds from spectral readings (Buchanan & Feigenbaum, 1978). By restricting the search domain to a narrow scope, such as specific knowledge that could be acquired from real-world experts on a given topic, the machines could manage the search domain and successfully find answers. The systems could then be used by non-experts to assist them. This was the first time these systems could be directly applied in industries.

Japan’s Fifth Generation Computing initiative provided a significant source of funding for massively parallel and concurrent logic systems (Shapiro, 1983). The attention helped revive interest in connectionist architectures (neural networks). During this time, “deep learning” techniques became more popular (Rumelhart & McClelland, 1987) and analytic tools progressed. For example, adding multiple layers to networks and employing feedback proved fecund and eventually was applied to optical character recognition (Russell & Norvig, 2003). Likewise, developments in a gradient estimation method, called “backpropagation” (Devaney, 1982), used to train neural networks began to produce significant results (Rumelhart et al., 1986). Backpropagation is still used today in many areas, such as speech recognition and language processing (Janciauskas & Chang, 2018). Its efficiency also assists with Sophia, a phonetic processor at Stanford (Liu et al., 2024).

With developments in the internet, massive complex datasets, or “big data”, allowed for a fecund synergy with neural networks. Deep learning machines required large datasets. Big data was beginning to fulfill this need. Meanwhile, Moore’s law, i.e., the growth of microprocessors is exponential (Moore, 1965), allowed for computing power to rapidly grow over the next decade. The advances in graphics cards or graphics processing units (GPUs), also began to contribute to the success in image detection. Even with the success, researchers were leery of using the moniker “AI” given its connotations of failed promises and science fiction references (Markoff, 2005).

The ImageNet competition in 2012 was a great step for deep learning. Just two years earlier, a breakthrough in deep learning occurred when work revealed that neural networks can perform unsupervised handwriting recognition using backpropagation (Ciresan et al., 2010). During the competition, AlexNet achieved a 15.3% error rate using convolutional neural network, a type of feed-forward neural network that regulates certain gradients when using backpropagation (Krizhevsky et al., 2012). This was over 10% better than all prior attempts. Part of their success was to heavily rely on GPUs. The success sparked massive work on image recognition and produced networks with a 95% accuracy rate by 2017 (Gershgorn, 2017). Besides image recognition, generative adversarial networks were showing promise at generating new results (Goodfellow et al., 2014). Underlying this success is the access to a massive amount of computing machines. A ‘modest’ network will contain only 2000 CPUs working together (Dean et al., 2012).

By 2016, natural language processing models were trained on a single domain of text, such as news articles (Jozefowicz et al., 2016). Training on a wide range of data sets across multiple domains was recommended to increase success (Radford et al., 2018). Large language models (LLMs), or vast neural networks used for classification and natural language processing, focused on statistical relationships between text documents. They  frequently used self-supervised learning, which then began to demonstrate success. By 2022, ChatGPT jettisoned AI into the public eye. While a novelty at first, successful prompt engineering can generate very useful outputs (Lock, 2022). OpenAI followed by releasing multi-modal improvements with GPT-4 (Wiggers, 2023). Meanwhile, several other LLMs have been released or are in development, such as Google’s PaLM or Meta’s LLaMA. It is important to recognize that the computational requirements of LLMs are exponentially greater than previous neural networks; GPT3, for example, required  285,000 CPU cores and 10,000 GPUs to train it (Tauscher, 2020).

Currently, AI seems ubiquitous in the media and it should raise some serious questions. First, is it really accomplishing what it claims? It is not clear if we are ascribing intentionality to AI chatbots that are stochastic parrots generating text without a clue to its meaning (Bender et al., 2021). To date, it is clear that most AI chatbots do not understand the context of questions or the social appropriateness of their  generated answers. For example,  AI Playground suggested that an Asian graduate student should  change her LinkedIn profile photo to a Caucasian image to look more professional (Singh, 2023). Relying on AI for decisions that can affect lives can be dangerous and currently there is an estimated $75 billion invested in AI (Amdur, 2023). This can greatly incentivize unscrupulous behavior.

Much like the Schachtürke (Mechanical Turk), the AI tools may appear to provide a service disingenuously. To circumvent these options and to protect people from potential dangers that these mistakes may incur, we need to remember that understanding the architecture is critical. By better understanding these tools and their effects on our lives, we can employ them more ethically. We can benefit from these tools while ensuring their ethical and responsible applications.

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Optimizing AI in Higher Education: SUNY FACT² Guide, Second Edition Copyright © by Faculty Advisory Council On Teaching and Technology (FACT²) is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book