1. Why Deep Learning for NLP?
Traditional NLP relied heavily on manual feature engineering (e.g., n-grams, TF-IDF). Deep learning models learn representations of text automatically. Key advantages include:
- Word Embeddings: Representing words as dense vectors where similar words are close in space.
- Context Awareness: Understanding the meaning of a word based on its surrounding words.
- End-to-End Learning: Mapping raw text directly to outputs (e.g., classification labels) without complex pipeline steps.
2. Key Architectures
Common neural network structures used in NLP:
- RNNs & LSTMs: Process text sequentially; excellent for understanding order and dependencies.
- Transformers: Use "Attention" mechanisms to weigh the importance of different words simultaneously. The basis for models like BERT and GPT.
- CNNs: Sometimes used for text classification to detect local patterns (like key phrases).
3. Common Tasks
Deep learning excels at various NLP tasks, such as:
- Sentiment Analysis (Positive/Negative classification)
- Machine Translation (e.g., English to Spanish)
- Named Entity Recognition (Extracting names, places, dates)
- Text Generation (Chatbots, auto-completion)
4. Data Preparation
Text must be converted into numbers before feeding it into a neural network. This usually involves tokenization, indexing, and padding.
# Example: Basic Tokenization and Padding with Keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
sentences = [
"I love deep learning",
"NLP is fascinating but hard"
]
# 1. Tokenize
tokenizer = Tokenizer(num_words=100, oov_token="<OOV>")
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
# 2. Convert to sequences
sequences = tokenizer.texts_to_sequences(sentences)
# 3. Pad sequences (make them same length)
padded = pad_sequences(sequences, padding='post', maxlen=6)
print(padded)
# Output: [[ 2 3 4 5 0 0]
# [ 6 7 8 9 10 0]]
5. TensorFlow (Keras) Example
Here is a simple Sentiment Analysis model using an Embedding layer and Global Average Pooling.
import tensorflow as tf
from tensorflow import keras
vocab_size = 10000
embedding_dim = 16
max_length = 100
model = keras.Sequential([
# Embedding: Turns positive integers (indexes) into dense vectors of fixed size.
keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
# Average the vectors to get a single vector for the sentence
keras.layers.GlobalAveragePooling1D(),
# Dense layers for classification
keras.layers.Dense(24, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
# model.fit(padded_train, training_labels, epochs=10, ...)
6. PyTorch Example
A similar text classification architecture implemented in PyTorch.
import torch
import torch.nn as nn
import torch.nn.functional as F
class TextClassifier(nn.Module):
def __init__(self, vocab_size, embed_dim, num_classes):
super(TextClassifier, self).__init__()
# Embedding Layer
self.embedding = nn.Embedding(vocab_size, embed_dim)
# Fully connected layers
self.fc1 = nn.Linear(embed_dim, 16)
self.fc2 = nn.Linear(16, num_classes)
def forward(self, text, offsets):
# text: 1-D tensor containing all bag of words
# offsets: starting index of each sequence
embedded = self.embedding(text, offsets)
x = F.relu(self.fc1(embedded))
return self.fc2(x)
# Parameters
VOCAB_SIZE = 10000
EMBED_DIM = 64
NUM_CLASSES = 2 # e.g., Positive / Negative
model = TextClassifier(VOCAB_SIZE, EMBED_DIM, NUM_CLASSES)
print(model)