【実装:StackGAN】StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

だいぶ前にStackGANの実装をサボっていました。

tsunotsuno.hatenablog.com

理論云々は上の記事を見てもらうとして、実装にフォーカスします。

ネットワークの概念図
実装サンプル
実際に動かしてみた結果
- 64 ×64
- 256×256
感想

ネットワークの概念図

実装サンプル

最近はDefine by runの実行モデルを頭に叩き込むためにpytorchを使うようにしているので今回はPytorchを使います。

Condition Augmentation

ここで結構ハマりました。下記のデータセットを使うと書いてあるんですが、これの使い方がなかなかわかりませんでした。

Caltech-UCSD Bird(CUB)
Oxford-102 flower

どこかからか持ってきたREADMEを見てみるとこんな感じで書いてあります。

Download our preprocessed char-CNN-RNN text embeddings for birds and flowers and save them to 'Data/'.

[Optional] Follow the instructions here to download the pretrained char-CNN-RNN text encoders and extract your own text embeddings.

Download the birds and flowers image data. Extract them to 'Data/birds/' and 'Data/flowers/', respectively.

Preprocess images.

For birds: 'python ./misc/preprocess_birds.py'

For flowers: 'python ./misc/preprocess_flowers.py'

そんなわけで、とりあえずこのあたりからファイルをダウンロードします。

落として来たら、/Dataの下に落とした.zipファイルを展開・配置します。今度は

http://www.vision.caltech.edu/visipedia/CUB-200-2011.html

からCUB-200-2011をダウンロードして、これもDataの中に展開・配置します。

この次に前処理なんですが、前処理ししようとしたらこんなエラー。

NameError: name 'xrange' is not defined

どうやら、Python2.xを想定しているらしく、Python3.xで動くように

"xrange" => "range"

に書き換えたら動きました。

改めて中身を見てみると、こんなファイル構成になっていました。

birds
- CUB_200_2011
　- attributes
　- images
　　- xxx.画像名
　　　- 画像名.jpg
　- parts
　- bounding_boxes.txt
　- classes.txt
　- image_class_labels.txt
　- README
　- train_test_split.txt
- test
　- xxx.pickle
- text_c10
　- xxx.画像名
　　-画像名.txt
- train
　- yyy.pickle
- example_captions.txt
- readme

見た感じ、学習はtrain/、テストはtest/から使えばいいみたいですね。さらに中身を見ていくと下のようになっていました。

char-CNN-RNN-embeddings.pickle : Textが数値化された10×1024の行列が8855個
class_info.pickle : 2-200までの整数が8855個
filenames.pickle : ファイル名が8855個

StackGANを実装する上ではEmbedding(Textの数値化したもの)とEmbeddingに対応する画像ファイルが分かれば問題無いはずなので、char-CNN-RNN-embeddings.pickleとfilenames.pickle、画像ファイルがあれば事足りそうです。 (他のファイルは別のことに使うんだと信じてます。間違ってたらごめんなさい。)