This project implements an image caption generator using a combination of LSTM for text generation and VGG16 for image feature extraction. The model is trained on the Flickr 8k dataset.
- Image Feature Extraction: VGG16 pre-trained on ImageNet
- Text Generation: Long Short-Term Memory (LSTM) network
The model is trained on the Flickr 8k dataset, which contains:
- 8,000 images
- 5 captions per image
- Python 3.7+
- TensorFlow 2.0
- Keras
- NumPy
- Matplotlib
-
Clone the repository:
git clone https://github.com/Roronoa-17/Image_Caption_Generator.git
-
Install the required packages:
pip install -r requirements.txt
-
Run the caption generator:
streamlit run app.py
- Download the Flickr 8k dataset
- Open the python notebook for further instructions.