advclip

Go to file

liwenyun 8bb9ef837f 添加 mlp_mix.py		2024-11-13 17:58:14 +08:00
.idea	remove non-exist key in edncoder	2024-06-26 16:48:08 +08:00
data	Add files via upload	2022-07-13 21:11:18 +08:00
dataset	a	2024-07-09 18:52:41 +08:00
model	up	2024-07-04 10:33:33 +08:00
train	up	2024-07-04 10:33:33 +08:00
utils	a	2024-07-09 18:52:41 +08:00
.gitignore	up ignore	2024-06-08 16:23:58 +08:00
LICENSE	Create LICENSE	2022-07-01 17:57:02 +08:00
README.md	Update README.md	2024-03-11 17:27:54 +08:00
auto_cifar10.py	添加 auto_cifar10.py	2024-10-23 20:09:41 +08:00
cifar10.py	更新 cifar10.py	2024-10-29 09:52:50 +08:00
cifar100.py	add cifar100.py	2024-10-28 19:28:53 +08:00
main.py	a	2024-07-09 18:52:41 +08:00
mlp_mix.py	添加 mlp_mix.py	2024-11-13 17:58:14 +08:00
requirements.txt	DCMHT	2022-07-01 17:35:22 +08:00
run.sh	a	2024-07-09 18:52:41 +08:00
svhn_at.py	更新 svhn_at.py	2024-10-28 19:01:32 +08:00

README.md

Framework

The main architecture of our method.

We propose a selecting mechanism to generate hash code that will transfor the discrete space into a continuous space. Hash code will be encoded as a seires of 2D vectors.

Dependencies

We use python to build our code, you need to install those package to run

pytorch 1.9.1
sklearn
tqdm
pillow

Training

Processing dataset

Before training, you need to download the oringal data from coco(include 2017 train,val and annotations), nuswide(include all), mirflickr25k(include mirflickr25k and mirflickr25k_annotations_v080), then use the "data/make_XXX.py" to generate .mat file

For example:

cd COCO_DIR # include train val images and annotations files

mkdir mat

cp DCMHT/data/make_coco.py mat

python make_coco.py --coco-dir ../ --save-dir ./

After all mat file generated, the dir of dataset will like this:

dataset
├── base.py
├── __init__.py
├── dataloader.py
├── coco
│   ├── caption.mat 
│   ├── index.mat
│   └── label.mat 
├── flickr25k
│   ├── caption.mat
│   ├── index.mat
│   └── label.mat
└── nuswide
    ├── caption.txt  # Notice! It is a txt file!
    ├── index.mat 
    └── label.mat

Download CLIP pretrained model

Pretrained model will be found in the 30 lines of CLIP/clip/clip.py. This code is based on the "ViT-B/32".

You should copy ViT-B-32.pt to this dir.

Start

After the dataset has been prepared, we could run the follow command to train.

python main.py --is-train --hash-layer select --dataset coco --caption-file caption.mat --index-file index.mat --label-file label.mat --similarity-function euclidean --loss-type l2 --vartheta 0.75 --lr 0.0001 --output-dim 64 --save-dir ./result/coco/64 --clip-path ./ViT-B-32.pt --batch-size 256

Result

Citation

inproceedings{10.1145/3503161.3548187,
author = {Tu, Junfeng and Liu, Xueliang and Lin, Zongxiang and Hong, Richang and Wang, Meng},
title = {Differentiable Cross-Modal Hashing via Multimodal Transformers},
year = {2022},
booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
pages = {453–461},
numpages = {9},
}