Go to file
liwenyun 8bb9ef837f 添加 mlp_mix.py 2024-11-13 17:58:14 +08:00
.idea remove non-exist key in edncoder 2024-06-26 16:48:08 +08:00
data Add files via upload 2022-07-13 21:11:18 +08:00
dataset a 2024-07-09 18:52:41 +08:00
model up 2024-07-04 10:33:33 +08:00
train up 2024-07-04 10:33:33 +08:00
utils a 2024-07-09 18:52:41 +08:00
.gitignore up ignore 2024-06-08 16:23:58 +08:00
LICENSE Create LICENSE 2022-07-01 17:57:02 +08:00
README.md Update README.md 2024-03-11 17:27:54 +08:00
auto_cifar10.py 添加 auto_cifar10.py 2024-10-23 20:09:41 +08:00
cifar10.py 更新 cifar10.py 2024-10-29 09:52:50 +08:00
cifar100.py add cifar100.py 2024-10-28 19:28:53 +08:00
main.py a 2024-07-09 18:52:41 +08:00
mlp_mix.py 添加 mlp_mix.py 2024-11-13 17:58:14 +08:00
requirements.txt DCMHT 2022-07-01 17:35:22 +08:00
run.sh a 2024-07-09 18:52:41 +08:00
svhn_at.py 更新 svhn_at.py 2024-10-28 19:01:32 +08:00

README.md

Differentiable Cross Modal Hashing via Multimodal Transformers paper

This project has been moved to clip-based-cross-modal-hash

Framework

The main architecture of our method. framework

We propose a selecting mechanism to generate hash code that will transfor the discrete space into a continuous space. Hash code will be encoded as a seires of 2D vectors. hash

Dependencies

We use python to build our code, you need to install those package to run

  • pytorch 1.9.1
  • sklearn
  • tqdm
  • pillow

Training

Processing dataset

Before training, you need to download the oringal data from coco(include 2017 train,val and annotations), nuswide(include all), mirflickr25k(include mirflickr25k and mirflickr25k_annotations_v080), then use the "data/make_XXX.py" to generate .mat file

For example:

cd COCO_DIR # include train val images and annotations files

mkdir mat

cp DCMHT/data/make_coco.py mat

python make_coco.py --coco-dir ../ --save-dir ./

After all mat file generated, the dir of dataset will like this:

dataset
├── base.py
├── __init__.py
├── dataloader.py
├── coco
│   ├── caption.mat 
│   ├── index.mat
│   └── label.mat 
├── flickr25k
│   ├── caption.mat
│   ├── index.mat
│   └── label.mat
└── nuswide
    ├── caption.txt  # Notice! It is a txt file!
    ├── index.mat 
    └── label.mat

Download CLIP pretrained model

Pretrained model will be found in the 30 lines of CLIP/clip/clip.py. This code is based on the "ViT-B/32".

You should copy ViT-B-32.pt to this dir.

Start

After the dataset has been prepared, we could run the follow command to train.

python main.py --is-train --hash-layer select --dataset coco --caption-file caption.mat --index-file index.mat --label-file label.mat --similarity-function euclidean --loss-type l2 --vartheta 0.75 --lr 0.0001 --output-dim 64 --save-dir ./result/coco/64 --clip-path ./ViT-B-32.pt --batch-size 256

Result

result

Citation

inproceedings{10.1145/3503161.3548187,
author = {Tu, Junfeng and Liu, Xueliang and Lin, Zongxiang and Hong, Richang and Wang, Meng},
title = {Differentiable Cross-Modal Hashing via Multimodal Transformers},
year = {2022},
booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
pages = {453461},
numpages = {9},
}

Acknowledegements

CLIP

SSAH

GCH

AGAH

DADH

deep-cross-modal-hashing

Apologize:

2023/03/01

I find figure 1 with the wrong formula for the vartheta, the right one is the function (10). It has been published, so I can't fix it.