DHE(Deep Hash Embedding)

1 minute read

Published: April 04, 2022

DHE(Deep Hash Embedding)

Abstract

This paper proposed a new method to generate embedding vector for recommendation models. Compared to looking up embedding vectors from an embedding table, the deep hash embedding can reduce the model size by 75% with similar AUC, and better generalization ability.

One-hot based embedding learning

For a categorical feature (e.g country), the recommendation models usually generate an one-hot vector of a feature value, and looks up the embedding vector from an embedding table (with random initialization values). If a feature value is “US”, then it will be converted into an one-hot vector $[0, 0, …, 1, 0, …, 0]$.

The one-hot vector can be denoted as $x$. The dimensions of $x$ is $1 \times m$ where m is the cardinality of the feature. Then it will lookup an embedding vector from an embedding table $W$ with dimensions of $m \times n$ where n is the embedding width. The process of “looking up” an embedding vector can be denoted as $xW$, which is a linear feature transformation. The $W$ here can be learned by back propagation (detailed computation).

Deep hash embedding

The limitations of one-hot based embedding learning are: (1) When m is very large, we need a lot of memory to store the embedding table (e.g. a 1 billion x 100 matrix takes 400GB of memory). Even though we can use hashing tricks to reduce the original cardinality of the feature, it still cannot be generalized to unseen feature values during inference.

The innovation of this paper is, it substitudes the traditional shallow, wide layer with a DNN. It reduces the number of parameter because DNN is more efficient, and improves the generalization ability because the weights of the DNN can be applied to any unseened feature values.

The limitation of this method is, as the author mentioned, it usually underfits the dataset. They achieved best performance by using the Mish activation function. Maybe it is because the original hashing encoding (input for the DNN) is too complex.

An interesting future direction is to jointly model multiple features using DHE.

Share on

Twitter Facebook LinkedIn

Qianying Zhou

DHE(Deep Hash Embedding)

DHE(Deep Hash Embedding)

Abstract

One-hot based embedding learning

Deep hash embedding

Share on

You May Also Enjoy

Understanding arenas and heaps in malloc()

Understanding arenas and heaps in glibc’s malloc()

Proxygen source code review

Nginx high concurrency strategy

Strategy for high concurrency