Deep learning is an advanced form of machine learning that tries to emulate the way the human brain learns using artificial neural networks that process numeric inputs rather than electrochemical stimuli.
The incoming nerve connections are replaced by numeric inputs that are typically identified as x
. When there’s more than one input value, x
is considered a vector with elements named x1
, x2
, and so on.
Associated with each x
value is a weight (w
), which is used to strengthen or weaken the effect of the x
value to simulate learning. Additionally, a bias (b
) input is added to enable fine-grained control over the network. During the training process, the w
and b
values will be adjusted to tune the network so that it “learns” to produce correct outputs.
The neuron itself encapsulates a function that calculates a weighted sum of x
, w
, and b
. This function is in turn enclosed in an activation function that constrains the result (often to a value between 0
and 1
) to determine whether or not the neuron passes an output onto the next layer of neurons in the network.
The deep neural network (DNN) model for the classifier consists of multiple layers of artificial neurons. An example would be classifying a sample as one of three penguin species based on 4 attributes. In this case, there are four layers:
x
) value.y
) value to be predicted by the model.You can decide how many hidden layers you want to include and how many neurons are in each of them; but you have no control over the input and output values for these layers - these are determined by the model training process.
Because of the layered architecture of the network, this kind of model is sometimes referred to as a multilayer perceptron. Additionally, notice that all neurons in the input and hidden layers are connected to all neurons in the subsequent layers - this is an example of a fully connected network.
The training process for a deep neural network consists of multiple iterations, called epochs. For the first epoch, you start by assigning random initialization values for the weight (w
) and bias (b
) values. Then the process is as follows:
An optimizer is an algorithm used to minimize the loss function of a neural network. It does this by updating the weights of the network based on the gradients of the loss function with respect to the weights. There are many different optimizers available, such as Stochastic Gradient Descent (SGD), Adam, Adagrad, and RMSprop, each with their own strengths and weaknesses.
The learning rate is a hyperparameter that determines how quickly the weights of the network are updated during training. It controls the step size in the weight space that the optimizer takes in each iteration. If the learning rate is too high, the optimizer may overshoot the minimum of the loss function and fail to converge, while if it is too low, the optimization may be too slow and require more time to converge.
In practice, finding the optimal learning rate for a given optimizer can be challenging, and a common approach is to use a learning rate schedule or adaptive learning rate methods, such as cosine annealing or learning rate decay, to gradually reduce the learning rate during training to help the optimizer converge to a good solution.
Machine learning models that work with images are the foundation for an area artificial intelligence called computer vision, and deep learning techniques have been responsible for driving amazing advances in this area over recent years.
Convolutional neural network (CNN) typically works by extracting features from images, and then feeding those features into a fully connected neural network to generate a prediction. The feature extraction layers in the network have the effect of reducing the number of features from the potentially huge array of individual pixel values to a smaller feature set that supports label prediction.
CNNs consist of multiple layers, each performing a specific task in extracting features or predicting labels:
A Convolutional Neural Network (CNN) for image classification is typically composed of multiple layers that extract features, and then use a final fully connected layer to classify images based on these features.
Conceptually, this neural network consists of two distinct sets of layers:
The feature extraction layers apply convolutional filters and pooling to emphasize edges, corners, and other patterns in the images that can be used to differentiate them, and in theory should work for any set of images with the same dimensions as the input layer of the network. The prediction layer maps the features to a set of outputs that represent probabilities for each class label you want to use to classify the images.
By separating the network into these types of layers, we can take the feature extraction layers from a model that has already been trained and append one or more layers to use the extracted features for prediction of the appropriate class labels for your images. This approach enables you to keep the pre-trained weights for the feature extraction layers, which means you only need to train the prediction layers you have added.
PyTorch and TensorFlow are two of the most popular deep learning frameworks used for building and training neural networks.
Both PyTorch and TensorFlow have their own strengths and weaknesses, and the choice between the two often depends on the specific use case and personal preferences. While PyTorch offers dynamic graphs and an intuitive interface, TensorFlow offers more advanced features and a wider range of tools for distributed computing and deployment. Ultimately, both frameworks are powerful tools for building and training deep learning models.
In this module you learned about the fundamental principles of deep learning, and how to create deep neural network models using PyTorch or Tensorflow. You also explored the use of convolutional neural networks to create image classification models.
Source: Microsoft Learn