Many ways of using TensorFlow, PyTorch, and Keras

TensorFlow, PyTorch, and Keras being currently the predominant deep learning frameworks, it is every data scientist’s mission to master these frameworks. Yet the complexity and richness of these frameworks enable many different ways of doing the same task and hence cause confusion. As a sign post, this article is a rundown of various ways of leveraging these frameworks to perform the same machine learning task. In particular, I will explain how to use each framework to solve Ordinary Least Square at a low, middle, and high level.

  • Low level: TensorFlow and PyTorch
  • Middle level: Keras and PyTorch
  • High level: Keras and Skorch

Let us begin with some preparation code. The following code loads all relevant packages and creates the training dataset via Numpy.

import numpy as np

import tensorflow as tf
from tensorflow import keras

import torch
from torch import nn
import skorch

N = 30
n, p = 1000, 5
β, σ = np.ones((p, 1)), 1.
rng = np.random.Generator(np.random.PCG64(seed=0))

x0 = rng.standard_normal((n, p))
y0 = x0 @ β + σ * rng.standard_normal((n, 1))

Low level: TensorFlow and PyTorch

At low level, we use TensorFlow and PyTorch as Numpy alternative, because these frameworks, first and foremost, are implementations of tensors (with automatic differentiation ability). Thus, we implement algorithms the same way as we do with Numpy, only replacing hand-calculated gradients with the automatically generated ones.

TensorFlow calls the tf.GradientTape() and tape.gradient() method.

## TensorFlow
x = tf.constant(x0, dtype=tf.float32)
y = tf.constant(y0, dtype=tf.float32)
lr = 1e-1

w = tf.Variable(tf.zeros((p, 1)))
for _ in range(N):
    with tf.GradientTape() as tape:
        loss = tf.reduce_mean((y - x @ w)**2)
    dw = tape.gradient(loss, w)
    w.assign_sub(lr * dw)

print("loss =", loss.numpy())
print("w.T =", w.numpy().T)

PyTorch calls the backward() method.

## PyTorch
x = torch.tensor(x0, dtype=torch.float32)
y = torch.tensor(y0, dtype=torch.float32)
lr = 1e-1

w = torch.zeros((p, 1), requires_grad=True)
for _ in range(N):
    loss = torch.sum((y - x @ w)**2) / n
    loss.backward()
    with torch.no_grad():
        w.sub_(lr * w.grad)
    w.grad = None
w.requires_grad_(False)

print("loss =", loss.item())
print("w.T =", w.numpy().T)

Middle level: Keras and PyTorch

At middle level, we leverage the prebuilt components provided by TensorFlow (through Keras) and PyTorch. These components include but are not limited to neural networks layers, loss functions, and optimizers.

## Keras
x = tf.constant(x0, dtype=tf.float32)
y = tf.constant(y0, dtype=tf.float32)

model = keras.Sequential(keras.layers.Dense(1, use_bias=False))
loss_fn = keras.losses.MeanSquaredError()
optimizer = keras.optimizers.SGD(learning_rate=1e-1)

for _ in range(N):
    with tf.GradientTape() as tape:
        loss = loss_fn(y, model(x))
    gradient = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradient, model.trainable_variables))

print("loss =", loss.numpy())
print(model.trainable_variables[0].numpy().T)
## PyTorch
x = torch.tensor(x0, dtype=torch.float32)
y = torch.tensor(y0, dtype=torch.float32)

model = nn.Sequential(nn.Linear(p, 1, bias=False))
loss_fn = nn.functional.mse_loss
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

for _ in range(N):
    loss = loss_fn(model(x), y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

print("loss =", loss.item())
print(next(model.parameters()))

High level: Keras and Skorch

At high level, we use TensorFlow and PyTorch (through Skorch) as machine learning managers. We specify the various building components and let the manager do the training for us. One advantage of using a manager is that it handles the mini-batch optimization for us, as is shown below.

## Keras
x = tf.constant(x0, dtype=tf.float32)
y = tf.constant(y0, dtype=tf.float32)

model = keras.Sequential(keras.layers.Dense(1, use_bias=False))
model.compile(
    loss = keras.losses.MeanSquaredError(),
    optimizer = keras.optimizers.SGD(learning_rate=1e-1)
)

history = model.fit(x, y, batch_size=n//10, epochs=N//10, verbose=0)

print("loss =", history.history["loss"][-1])
print(model.trainable_variables[0].numpy().T)
## Skorch
x = torch.tensor(x0, dtype=torch.float32)
y = torch.tensor(y0, dtype=torch.float32)

model = nn.Sequential(nn.Linear(p, 1, bias=False))
model_manager = skorch.NeuralNetRegressor(
    model,
    optimizer = torch.optim.SGD,
    lr = 1e-1,
    batch_size = n//10,
    max_epochs = N//10,
    train_split = None,
    verbose = 0
)

history = model_manager.fit(x, y)

print('loss =', history.history[-1]["train_loss"])
print(next(model.parameters()))

Conclusion and outlook

In this post, I used both the TensorFlow ecosystem and the PyTorch ecosystem solve the OLS problem. Although OLS is simple, the same methods (at all three levels) can be applied to any machine learning models/algorithms.

Although it is tempting to master only the high level and forego the others, an aspiring data scientist should master all three levels. The reason is that sometimes these frameworks do not provide the desired components, and you need to write lower-level code to customize higher-level behavior (through polymorphism).

Written on November 29, 2023