Artificial Intelligence (AI) is usually conceptualized as the field of development of machines which are able to think and learn like human-beings. Learning refers to improving performance on future tasks based on past behavior. Enabling an agent to learn is a field of the so called Machine Learning (ML). As a sub-field of ML, Artificial Neural Network (ANN)- being grounded in the biology of brain, purports to develop computers that are able to learn based on various techniques such as parallel distributed processing, neural computation and connectionism. One of the mostly used ANN types is the supervised learning network in which by providing an input and the correct output into a network, the network learns how to map inputs to outputs. The focus of this paper is supervised learning networks which keep developing rapidly.
Being grounded within the biology of a human brain. ANN makes use of artificial neurons referred to as perceptron and sigmoid. This artificial neuron receives input from preceding neurons and makes a decision to ‘fire’ to the next neurons by evaluating each input according to its own perspective and then summing up all inputs up to get a holistic view taking into account all available inputs, the internal judgment system, and the threshold referred to as weights, activation function and bias respectively. By changing weights and bias, a completely different decision can be reached. This phenomenon is also well-known in connectivism- a well-known theory in cognitive science literature- which asserts that networks exist everywhere. In a similar vein, ANN researchers sometimes conceive a single neuron as a person and in some other times conceive the whole network as a single human brain. In addition to this, according to both ANN and connectivism literature, each neuron has its own bias and, therefore, can work in completely different way. If ANN is going to learn the right weights and biases, a small change in one weight or bias should lead to a small change in network output. In order to be able to learn, ANN should move slowly and smoothly from one decision to another. For these reasons, ANN researchers have examined many alternative soft activation functions such as sigmoid, tanh and rectified linear neuron.
On the other hand, one of the common criticisms regarding connectivism is its oversimplification of interaction among the nodes as the connection can be either active or inactive. However, connectivism proponents proved that a connection is not necessarily sharp. The graded view of a connection is congruent with sigmoid neuron function. In addition to this, regarding the speed of learning, according to current theories in cognitive science such as connectivism, decision-making is embeds a learning process based on a shifting reality. So, there might be ‘changes in the information climate affecting the decision’ (Siemens, 2005, p. 5) due to this shifting reality. In a similar way, the output of soft neuron can go smoothly from 0 to 1 so that the output becomes a real number.
One of the mostly used neuron types in ANN is the sigmoid neuron. Contrary to commonly held assumptions, the neuron does not learn anything in fact, only its bias changes, yet this does not modify its decision. In order to resolve this issue, researchers try to make learning independent on activation function. A sensible question to ask would be whether the human internal judgment system is reflected through means of such function. To give a specific example, if an individual decides to share his opinion or to make no comments at all would that signify a saturation level of his knowledge? Based on such questions, proponents of connectivism made the following classification with regard to mental interactions (ranked from concrete to abstract and from surface to deep cognitive engagement) (Siemens, 2005):
- Operation interaction:
- Way-finding interactions:
- Sense-making interaction:
- Innovation interaction:
Pattern recognition, aggregation, decision-making and sharing reside in the level of sense-making interaction which implies that information sharing or similar acts such as a network development through gathering, remixing and repurposing of connections is not the same thing as the sigmoid function. A single neuron entails its own logic (in the sense of applying logic gates such as NOT, AND , OR) leading to a network of logic gates. Therefore, meaning exists in pattern of connections which is also one of the underpinning ideas of connectivism. The whole meaning emerging out of a group of nodes plays a more important role as the sum is greater than its parts. In order to develop a learnable network, the network architecture should be formed in a certain way through arranging neurons. To do this, it is useful to bear in mind that an ANN contains the following levels of abstraction:
- ANN as a human brain referring to innate individual capacities
- ANN as a group of individuals referring to the way of arranging a network
As ANN is a universal modelling system it can learn any given function regardless of neuron types used based on this universality. The key is how to order the neurons in order for the learning algorithm to discover related weights and biases. The most common features of an ANN architecture would be:
- number of layers
- information flow
- connectivity among neurons
The initial idea of providing a deep network goes back to the concept of splitting a complex problem. While trying to resolve a complicated problem, individuals tend to fragment it into a sub-set of smaller problems and then re-construct them in order to solve the complete problem. The underpinning assumption was that while the first hidden layer might focus on the first level of problem, the subsequent layers would focus on next levels of problems. Therefore, in comparison to a shallow neural network, a deep neural network possesses two or mode hidden layers. Yet, the terms of ‘shallow and ‘deep’ might be misleading as they don’t refer to surface and deep learning. In addition to this, apart from providing more precise results based upon further training, there is no special functionality provided by deep neural networks. Besides, the concept of being made of a sequence of layers runs counter-intuitive to the idea of connectivism in the sense that organizing neurons in layers by to re-arranging neurons positions spatially would still cause a constraint on connectivity given the representation of each layer by mathematical vectors for related weights and biases. Based on the the information flow within a network, ANN can be categorized as follows (Goodfellow et al. 2016):
- Feedforward networks: As its name implies, information flow occurs in one direction so that the output of a layer becomes the input for the next layer. In this way, network complexity is reduced to a great extent. Yet, one disadvantage is that given the inherent assumption that the order of inputs is not significant some tasks such as those related to natural language processing could not be processed.
- Recurrent networks:Input is processed sequentially by a family of neural networks so that feedback connections can occur. There is a delay constraint regarding the feedback mechanism so that the output of a hidden layer can only be passed to the input of the next hidden layer rather than the current step itself.
Ideally, a network should be fully connected so that connectivity exists among each neurons in layers to enable a maximum amount of interaction. Yet, still one caveat to such an architecture would be that as some connections might need to be removed depending on the desired output of the algorithm, the learning algorithm might be exposed to additional difficulties as adding various neurons would increase the number of weights and biases to be learned significantly. On the other hand, in a convolutional network, there is limited connectivity as a neuron in a specific layer is only connected to a set of spatially adjacent neurons in the previous layer. While increasing connectivity adds to the complexity of ANN which makes learning harder, a convolutional network, on the other hand, decreases the connectivity and obtains better results. It should also be kept in mind that such theoretical classifications are usually superficial as in reality engineers and researchers are used to combining network architectures. To give a specific example, a deep network may possess two convolutional layers followed by one fully-connected layer. In general, mixing different network architectures shows better accuracy. Although the design of a network architecture might sound complicated, training of these networks would be more difficult. Although the principles used to teach a single neuron are similar to those to teach a whole network, a network level adds extra complexity which requires an additional step.
One of the most challenging aspects of ANN is that it does not enable dynamic knowledge as it cannot discover something new by manipulating some inputs or outputs, but merely to learn something previously known. So, it might be worth pondering whether using ANN would be appropriate in every case. Also, assuming that by using ANN’s we could imitate the human brain would be misleading. While the inputs of neurons might correspond to experiences and knowledge of a person the output would be the perception about that reality. The difference between neuron outputs and correct outputs represents the gap between user’s perceptions and the reality. According to ANN researchers, one should look at the gap between reality and perception rather than the correctness of perception itself (Nielsen 2015). To give a specific example, in reality if a person gives two wrong answers (125, 20) for a given question of finding the square of 5, the answers should not be treated on an equal basis, as 20 is closer to the reality even though the answer is not correct. So, the number of correct answers is not related to the changes in weights and biases.
Another challenge for ANN is its inherent assumption of representing the human consciousness. According to the prominent scholar Bandura (2006), the following core properties distinguish human-beings from automatons:
- self-reactiveness, and
These four properties, in general are classified as self-regulation and metacognitive processes that enable the individual to decide on their goals and exert control over their processing activities. For purposes of illustration, consider the following analogy could be used:
- The software engineer represents the consciousness setting the goals and plans for experiences.
- A learnable software represents a neural pattern written in the brain. This learnable software is a dynamic program which can automatically discover mistakes and rewrite itself under the engineer’s supervision.
- The engineer gives instructions to the learnable software but does not engage in writing the software manually as he is not a programmer and does not even know how the software is written.
- Once the software is written ‘correctly’, the consciousness releases its control over the written software and the software is working deliberately.
- Only when something goes unexpected, the consciousness comes back to manage the process of rewriting the software again.
According to Bandura (2006), such a perspective would make the individual devoid of its identity. Consciousness is an emergent property of brain activities which cannot merely be reduced to the property of neurons activity. In other words, the consciousness is higher-level activity being a result of lower-level neural activities although its properties exceed them. According to the ANN design logic, consciousness is seen as a completely distinct entity.
Use of back-propagation along with gradient descent algorithm helps to reduce the gap between correct output and network output by calculating the cost function and its gradient so that that each neuron in the network moves closer toward learning its right weights and bias. Once the gradient descent arrives at a point where it can’t reduce the cost anymore, the network reaches the maximum approximation to the correct output. Yet, given a network of millions of real neurons within the human mind, what are those right weights and bias that a single artificial neuron learns? According to cognitive science, a meaning within the human mind is distributed across group of connections or patterns. The inner entities might be well organized and serve the purpose of the whole network, yet by looking at single entity or small number of entities, one may fall in the illusion of finding conflicting and contradictory ideas within the whole network.
As Nielsen (2015) asserts, in the early days of AI, the optimistic effort to build an AI was assumed to make us understand the principles underpinning the real functioning of the human brain. Yet, perhaps we would end up understanding neither the brain nor AI.