Celebrity face recognition dataset The system is trained on a celebrity face dataset and fine-tuned using Ultralytics YOLOv8 classification mode for accurate and efficient recognition. To facilitate the above face recognition task, we provide a large training dataset which covers the top 100K celebrities. The contestants are asked to develop image recognition system based on, but not limited to, the datasets provided by the Challenge (as training data) to recognize 1M celebrities from their face images. To thoroughly evaluate our work, we introduce a new large-scale dataset for face recognition and retrieval across age called Cross-Age Celebrity Dataset (CACD). It is widely used in the fields of facial recognition, image generation, and facial attribute analysis, thanks to the richness of its annotations. Voxceleb Dataset and Voxceleb2 Dataset - audio-visual datasets consisting of short clips of human speech. Multi-Modal-CelebA-HQ (MM-CelebA-HQ) is a dataset containing 30,000 high-resolution face images selected from CelebA, following CelebA-HQ. Can also be used for face recognition. Apr 13, 2023 · An A-Z directory of databases of face stimulus images for use in behavioral research Dataset of around 800k images consisting of 1100 Famous Celebrities and an Unknown class to classify unknown faces - prateekmehta59/Celebrity-Face-Recognition-Dataset May 30, 2023 · Face Recognition Models: Dive deepinto the realm of face recognition models such as DeepFace, FaceNet, VGG-Face, & ArcFace, toolkits, datasets, and pipelines. After preprocessing, 87. Our presented dataset contains many real-world photos of Korean celebrities in various environments that might contain stage lighting, backup dancers, and background objects. CelebA has large diversities, large quantities, and rich annotations, including 10,177 number of identities, 202,599 number of face images, and 5 landmark May 24, 2025 · The dataset consists of celebrity images from casia webface with face masks. npy that can be copied from Giphy pretrained resources archive). CelebA boasts extensive diversities, large quantities, and rich annotations, including 10,177 identities, 202,599 face images, 5 landmark locations, and 40 binary The CelebFaces Attributes Dataset (CelebA) consists of more than 200K celebrity images with 40 attribute annotations each. The raw data contains images of different sizes for 13 different Bollywood celebrities. 0M face images of 1M identities are collected. Figure shows detailed demography Jun 1, 2024 · The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, and landmark (or facial part) localization. Celebrity Face Dataset for Face Analysis and Look-alike Detection Create a directory face_recognition inside work directory, which must contain weights for MTCNN model (3 files with names det1. Load the CACD dataset in Python fast with one line of code. Description In this page, we present the MillionCelebs Face Recognition Dataset, which contains one million celebrity face images. npy, and det3. We propose a large-scale, high-quality, and diverse video dataset, named the High-Quality Celebrity Video Dataset (CelebV-HQ), with rich facial attribute annotations. Sep 21, 2022 · In this article, I covered three of the most popular face datasets you can use to build your own face recognition and face detection models—CelebFaces, FFHQ, and LFW. If you require text annotation (e. Automatically clip videos through a pipeline including face detection, face recognition, speaker validation and speaker diarization. Celeb Face Recognition DatasetSomething went wrong and this page crashed! If the issue persists, it's likely a problem on our side. The face classification system is an important tool for recognizing personal identity properly. I hope this will give you a head start on your next computer vision project. The discrepancy is the result of the domain gap between the training and test scenarios, since the subjects in the public datasets are mostly Caucasian. The face mask is attached with the face image using the masktheface tool Aug 19, 2021 · This free app lets you easily detect actors and celebrities via face recognition, powered by AI. The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, and landmark (or facial part) localization. The masks of CelebAMask-HQ were manually-annotated with the size of 512 x 512 and 19 classes including all facial components and accessories such as Static Face Images for all the identities in VoxCeleb2 can be found in the VGGFace2 dataset. Check our paper for more benefit details. 50K Celebrity Faces Image DatasetSomething went wrong and this page crashed! If the issue persists, it's likely a problem on our side. Run create_celeb_model. g. Web-crawled face images are severely biased (in terms of race, lighting, make-up, etc) and often contain label noise. 4K images of 7,676 Chinese celebrities crawled online automatically from versatile sources. Images cover large pose variations, background clutter, and diverse people, making this dataset great for training and testing models for face detection. 2. Abstract. Sep 9, 2025 · Here is the list of 20 best face recognition datasets for ML in 2025: for unlocking doors, verifying selfies, or flagging deepfakes. Each image has segmentation mask of facial attributes corresponding to CelebA. The 1M celebrities are obtained from Freebase based on their occurrence frequencies (popularities) on the web. In this project, I've tried 2 face detectors : MTCNN and Explore and run machine learning code with Kaggle Notebooks | Using data from Face Recognition Dataset - Oneshot Learning Celebrity Classifier Model description This model classifies a face to a celebrity. Square cropped to face. We use a data-driven method to address the cross-age face recognition problem, called cross-age reference coding (CARC). More Mar 3, 2020 · Projects: The dataset can be employed as training and testing sets for the following computer vision tasks: face attribute recognition, face detection, landmark (or facial part) localisation, and face editing & synthesis. Face detection and tracking: RetinaFace and ArcFace Welcome to the Celebrity Face Recognition project! This repository demonstrates a deep learning-based face recognition system using the VGGFace model. Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet. It is trained on tonyassi/celebrity-1000 dataset and fine-tuned on google/vit-base-patch16-224-in21k. Benefits of using C-MS-Celeb to train a face recognition model We use our C-MS-Celeb dataset to train a face recognition model and the image below shows that using C-MS-Celeb for model training can increase the model's performance. Usage: To use this model, provide a test image with a celebrity's face, and the code will predict the celebrity's name based on the trained model. Cropped Version: CelebA dataset with the additional postprocessing, which cuts the image around the face using face_recognition package Apr 5, 2016 · This year we will focus on face recognition task. Provide path to the dataset folder (for example, celeb_images folder) in the create_celeb_model. We gathered a face image dataset with 356. However, none of these focus on the specific challenge of face recognition under the disguise covariate. The Celebrity Face Image Dataset is a curated collection featuring images of 18 renowned Hollywood celebrities, each represented by 100 high-quality images. By providing a diverse and comprehensive collection of celebrity images, it facilitates the creation of more accurate and reliable facial recognition systems, contributing to advancements in security, media, and artificial intelligence. 基于Kaggle提供的开源数据集Five Celebrity Faces Dataset,构建基于迁移学习方法的人脸识别模型 - HopHill/Face-Recognition Explore iMerit’s curated list of 17 facial recognition datasets, ranging from annotated video frames and age-labeled faces to spoof detection sets and more. The images range from extreme poses to heavily background-cluttered backgrounds. Apr 21, 2024 · Celebrity-Face-Recognition-Dataset 概述 数据集基本信息 数据集名称:Celebrity-Face-Recognition-Dataset 数据集大小:172 GB 数据集组成:约800,000张图像,包含1100位著名名人和一个未知类别用于分类未知面孔。 图像来源:所有图像均从Google抓取,无重复图像。 类别详情: 每位名人类别(文件夹)约包含700-800张 To thoroughly evaluate our work, we introduce a new large-scale dataset for face recognition and retrieval across age called Cross-Age Celebrity Dataset (CACD). npy, det2. for audio-visual speech recognition), also consider using the LRS dataset. . In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corre-sponding entity keys in a knowledge base. 8% on Labeled Faces in the Wild (LFW) dataset. CelebA has large diversities, large quantities, and rich annotations, including 10,177 number of identities, 202,599 number of face images, and 5 landmark locations, 40 binary Introduction Facial recognition technology (FRT) relies on massive datasets to “learn” faces to accurately identify or verify the identity of a person. All the images have been scraped from Google and contains no duplicate images. Like other face recognition projects, the procedure is : face detection, face alignment, embedding vectors generation. Celebrity-Face-Recognition-Dataset Dataset of around 800k images consisting of 1100 Famous Celebrities and an Unknown class to classify unknown faces. Each celebrity has between 50 to 100 high-quality, localized face images. MS-Celeb-1M (MS1M) Microsoft Celeb (MS-Celeb-1M, or MS1M) is a dataset of 10 million face images harvested from the Internet for the purpose of developing face recognition technologies. To close this gap, we collect a Korean face recognition dataset of celebrity images from the Inter-net. This repository is about the resource of a new Chinese Celibrites face images database. VGGFace2 Dataset - a large-scale face recognition dataset. CelebA is a popular dataset that is commonly used for face attribute recognition, face detection, landmark (or facial part) localization, and face editing & synthesis. Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources Abstracts generated by AI 1 Rekognition › dg Celebrity recognition compared to face search Face recognition verifies identities, authenticates access, aids public safety, social media applications; celebrity recognition identifies celebrities in large volumes. The dataset contains more than 160,000 images of 2,000 celebrities with age ranging from 16 to 62. More specifically, we propose a benchmark task to recognize one million celebrities from their Celebrity-Face-Recognition-Dataset Dataset of around 800k images consisting of 1100 Famous Celebrities and an Unknown class to classify unknown faces. Jun 30, 2025 · Train your AI systems with 19 free face recognition datasets. Small dataset for face recognition tasksSomething went wrong and this page crashed! If the issue persists, it's likely a problem on our side. Face Detection Datasets. The rich Celebrity 1000 Top 1000 celebrities. Created by Microsoft Research it provides a massive resource for training and evaluating face recognition models. The dataset is ideal for machine learning and deep learning projects, such as face recognition, celebrity lookalike prediction Recognizing One Million Celebrities Training set More details about the dataset please see the dataset document. CelebA (CelebFaces Attributes Dataset) is an iconic Computer Vision dataset, centered on human faces. Contribute to Yifan0110/Celebrity-Face-Recognition development by creating an account on GitHub. Last updated: Tuesday, 21 July 2020Partner sites Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources Oct 6, 2025 · Face identification, customization, and celebrity recognition features are available to Microsoft managed customers and partners. py file. A part of (more than 1000 observations) the Bollywood celebrity faces was picked from Kaggle for face recognition purposes. The MillionCelebs face dataset is collected from the Internet Image Search Engine according to the Freebase celebrity name list released along with MS1M. Famous people faces dataset to test facial recognition methods Download Table | Face recognition datasets from publication: MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World | Face recognition, as one of the most well-studied Jul 16, 2024 · Top Face Datasets Face detection can help with numerous tasks. This project implements a Face Recognition System powered by YOLOv8. Note: CelebA dataset may contain potential bias. 256x256. This paper introduces a new Large-Scale Korean Influencer Dataset named KoIn. Abstract State-of-the-art face recognition models show impressive accuracy, achieving over 99. The images in this dataset cover large pose variations and background clutter. Over 200k images of celebrities with 40 binary attribute annotations This repository contains a dataset of localized face images of 150 celebrities from Bollywood, Hollywood, and South Indian cinema, with equal representation of 50 celebrities from each industry. Sep 7, 2024 · 背景与挑战 背景概述 Celebrity-Face-Recognition-Dataset(名人面部识别数据集)是由知名研究机构与人工智能实验室联合创建的,旨在推动面部识别技术的发展。 该数据集的创建始于2018年,主要研究人员包括多位在计算机视觉领域享有盛誉的专家。 Jul 10, 2020 · Details CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. How to use Publications Teaching Software and Data YouTube Celebrities Face Tracking and Recognition Dataset The Classification of 105 Celebrities with Face-Recognition using Tensorflow-Framework - Srikeshram/Celebrity-Face-Recognition [Paper] [Demo] CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Apr 13, 2024 · Make sure each image has only 1 face (of the desired celebrity), if there are multiple faces, only the first detected face will be considered. The project used two celebrities datasets to train the models: Kaggle 5 Celebrity Faces Dataset Code Basics Dataset The celebrities included in this project are Jerry Seinfeld, Ben Affleck, Maria Sharapova, Serena Williams, Virat Kohli, Roger Federer, Lionel Messi, Madonna, and Mindy Kaling. Apr 17, 2017 · This paper introduces a method for face recognition across age and also a dataset containing variations of age in the wild. The collection is performed using a semi-automatic pipeline in order to minimize manual e Abstract: Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields. The dataset encompasses diverse images with significant pose variations and background clutter. These various images can be useful Multi-view face recognition, face cropping and saving the cropped faces as new images on videos to create a multi-view face recognition database. To avoid the problems associated with real face datasets, we introduce a large-scale synthetic dataset for face recognition, obtained by photo-realistic rendering of diverse and high-quality digital faces using a computer graphics pipeline. 0 license Jul 10, 2020 · CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. Figure shows detailed demography 1800+ images of celebrity faces of 18 people! This dataset is invaluable for researchers aiming to push the boundaries of facial recognition technology. More speci cally, we propose a benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individ-ual on the web as training data. Abstract The face classification system is an important tool for recognizing personal identity properly. The rich Model purpose: one-shot face recognition, large-scale celebrities recognition with extremely imbalanced training data Training data: MS-Celeb-1M low-shot training dataset, dataset download page; no external data used. Sep 14, 2021 · It also contains information about bounding boxes and facial part localisation. Data preprocessing I divided the data set into 3 parties : trainning set, validation set and test set with the proportion : 60%, 20%, 20% to prepare for the training. It captures, analyzes, and compares patterns based on a person’s facial features. It has substantial pose variations and background clutter. It combines face detection and face classification to recognize celebrity identities in real-time from images or videos. Perfect for emotion detection, pose analysis, and facial recognition research Dataset of around 800k images consisting of 1100 Famous Celebrities and an Unknown class to classify unknown faces - prateekmehta59/Celebrity-Face-Recognition-Dataset Celebrity-Face-Recognition-Dataset Dataset of around 800k images consisting of 1100 Famous Celebrities and an Unknown class to classify unknown faces. com/prateekmehta59/Celebrity-Face-Recognition-Dataset 1239 reads Google Scholar RTF EndNote XML Celebrity-Face-Recognition In this data science and machine learning project, we classify sports personalities. 18,184 images. Oct 9, 2023 · The LFW (Labeled Faces in the Wild) dataset and the YTF (YouTube Faces) dataset are two popular datasets that have been widely used in the field of facial recognition. Features will be extracted Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources Can you identify faces based on very few photos? CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. Oct 27, 2022 · This is a list of top facial recognition datasets that can be used and studied for facial recognition projects. Each image in the dataset is accompanied by a semantic mask, sketch, descriptive text, and an image with a transparent background. However, the academic community still lacks a video dataset with diverse facial attribute annotations, which is crucial for the research on face-related videos. The Indian Celebrity Dataset for Face Recognition (IC D FR) is a new dataset compiled from publicly available images of Indian celebrities including Cricketers, Actors, Politicians, Social Workers, Scientists and other Celebrities. CelebV-HQ contains 35,666 video clips involving 15,653 identities and 83 manually labeled facial attributes covering appearance, action, and emotion. Dataset description tonyassi/celebrity-1000 Top 1000 celebrities. To apply for access, use the facial recognition intake form. This training dataset is prepared by the following steps. Each Celebrity class (folder) consists approximately 700-800 images and the Unknown class consists of 100k images. CACD has 160,000 images of 2,000 celebrities. Then Dataset of around 800k images consisting of 1100 Famous Celebrities and an Unknown class to classify unknown faces - prateekmehta59/Celebrity-Face-Recognition-Dataset Dive into 17 Celebrity Worlds with 100 Glamorous Images Each ! The CelebA: Large-Scale CelebFaces Attributes Dataset comprises over 200,000 celebrity images, each annotated with 40 attributes. It presents different sizes, various pose angles, a range of different light Sep 17, 2016 · In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. First, we select the top 100K entities from the 1M celebrity list in terms of their popularities. Faces will be explored using Haar-like Classifiers in OpenCV. These various images can be useful for training Mar 16, 2024 · The publicly available datasets are much smaller than that being used privately in industry, such as Facebook [2, 3] and Google [4], as summarized in Table 1. Additionally, Variational Autoencoder (VAE) is a powerful generative model that can be used to generate new face images for face recognition using a dataset like CelebA. This dataset is an essential resource for researchers, developers, and data scientists working on machine learning, deep learning, and computer vision projects, particularly those focused on facial recognition, emotion detection, and About The Classification of 105 Celebrities with Face-Recognition using Tensorflow-Framework python tensorflow numpy kaggle dataset image-classification face-recognition matplotlib python-3 tensorflow-framework transfer-learning celebrity validation-accuracy tensorflow2 celebrity-face-recognition Readme Apache-2. According to Microsoft Research, who created and published the dataset in 2016, MS Celeb is the largest publicly available face recognition dataset in the world, containing over 10 million images of nearly Discover the Open Celebrity Faces Dataset, tailored for evaluating face reidentification and recognition technologies. Though the research in face recognition highly desires large datasets consisting of many distinct people, such large dataset is not easily or publicly accessible to most researchers. Several deep learning models for face detection and face recognition are explored and compared. Jan 3, 2025 · This dataset is a collection of three datasets: ORL Faces (~400 images, 40 classes) LFW Dataset (~13000 images, ~5700 classes) CelebA Dataset (200k+ images, 10k+ identities) in two versions: Raw Version: classic CelebA dataset, retrieved from the Kaggle Page. Examples include recognizing individual identities, analyzing facial expressions, and detecting facial features. Oct 15, 2023 · The face classification system is an important tool for recognizing personal identity properly. Our presented dataset contains many real-world photos of Ko-rean celebrities in various environments that might contain stage lighting, backup dancers, and background objects. Nov 7, 2021 · Face recognition research community has prepared several large-scale datasets captured in uncontrolled scenarios for performing face recognition. I showed technical details that can help you retrieve the datasets and use them in your model code. This project can be further extended by adding more celebrities to the dataset, fine-tuning hyperparameters, and experimenting with other deep learning models to improve recognition accuracy. It can identify people with brown hair Apr 24, 2025 · The MS-Celeb-1M dataset is a large-scale face recognition dataset with 1 million images of 100,000 celebrities. Image databases about automatically detecting human faces in images or videos. Grounding the face recognition In this paper we present a celebrity face matching system to match a random human face with celebrities' faces taken from the Pins Face Recognition dataset. CelebA has large diversities, large quantities, and rich annotations, including 10,177 number of identities, 202,599 number of face images, and 5 landmark Collect audio data of 1,000 Chinese celebrities. URL: https://github. Create a benchmark database for speaker recognition community. #Computer vision #Transfer Learning. Stream CACD while training in PyTorch & TensorFlow. We restrict classification to only 5 people: Maria Sharapova Serena Williams Virat Kohli Roger Federer Lionel Messi Here is the folder structure: model: Contains python notebook for model building Can be use for celebrity predictor modelSomething went wrong and this page crashed! If the issue persists, it's likely a problem on our side. The system identifies and verifies celebrity faces from a dataset using pre-trained facial embeddings. cynoi mqbh snedzi chmylbf oeqnf epfit dlvskbk vvyi sjbj iah bhls aakxi eefy tmvy maypoi