1. Why Deep Learning for NLP?

Traditional NLP relied heavily on manual feature engineering (e.g., n-grams, TF-IDF). Deep learning models learn representations of text automatically. Key advantages include:

  • Word Embeddings: Representing words as dense vectors where similar words are close in space.
  • Context Awareness: Understanding the meaning of a word based on its surrounding words.
  • End-to-End Learning: Mapping raw text directly to outputs (e.g., classification labels) without complex pipeline steps.

2. Key Architectures

Common neural network structures used in NLP:

  • RNNs & LSTMs: Process text sequentially; excellent for understanding order and dependencies.
  • Transformers: Use "Attention" mechanisms to weigh the importance of different words simultaneously. The basis for models like BERT and GPT.
  • CNNs: Sometimes used for text classification to detect local patterns (like key phrases).

3. Common Tasks

Deep learning excels at various NLP tasks, such as:

  • Sentiment Analysis (Positive/Negative classification)
  • Machine Translation (e.g., English to Spanish)
  • Named Entity Recognition (Extracting names, places, dates)
  • Text Generation (Chatbots, auto-completion)

4. Data Preparation

Text must be converted into numbers before feeding it into a neural network. This usually involves tokenization, indexing, and padding.

# Example: Basic Tokenization and Padding with Keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

sentences = [
    "I love deep learning",
    "NLP is fascinating but hard"
]

# 1. Tokenize
tokenizer = Tokenizer(num_words=100, oov_token="<OOV>")
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index

# 2. Convert to sequences
sequences = tokenizer.texts_to_sequences(sentences)

# 3. Pad sequences (make them same length)
padded = pad_sequences(sequences, padding='post', maxlen=6)

print(padded)
# Output: [[ 2  3  4  5  0  0]
#          [ 6  7  8  9 10  0]]

5. TensorFlow (Keras) Example

Here is a simple Sentiment Analysis model using an Embedding layer and Global Average Pooling.

import tensorflow as tf
from tensorflow import keras

vocab_size = 10000
embedding_dim = 16
max_length = 100

model = keras.Sequential([
    # Embedding: Turns positive integers (indexes) into dense vectors of fixed size.
    keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),

    # Average the vectors to get a single vector for the sentence
    keras.layers.GlobalAveragePooling1D(),

    # Dense layers for classification
    keras.layers.Dense(24, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

# model.fit(padded_train, training_labels, epochs=10, ...)

6. PyTorch Example

A similar text classification architecture implemented in PyTorch.

import torch
import torch.nn as nn
import torch.nn.functional as F

class TextClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_classes):
        super(TextClassifier, self).__init__()
        # Embedding Layer
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        # Fully connected layers
        self.fc1 = nn.Linear(embed_dim, 16)
        self.fc2 = nn.Linear(16, num_classes)

    def forward(self, text, offsets):
        # text: 1-D tensor containing all bag of words
        # offsets: starting index of each sequence
        embedded = self.embedding(text, offsets)
        x = F.relu(self.fc1(embedded))
        return self.fc2(x)

# Parameters
VOCAB_SIZE = 10000
EMBED_DIM = 64
NUM_CLASSES = 2 # e.g., Positive / Negative

model = TextClassifier(VOCAB_SIZE, EMBED_DIM, NUM_CLASSES)
print(model)

7. Further Reading

Note: Modern NLP has largely shifted towards Transformer-based models (like BERT, RoBERTa, GPT) for complex tasks. While simple models (Embedding + Dense/LSTM) are great for learning, Transformers provide state-of-the-art results on large datasets.