Youtube-sDataAnalytics-

IMAGE_1

Analysis of structured data has seen tremendous success in the past. However, analysis of large scale unstructured data in the form of video format remains a challenging area. YouTube, a Google company, has over a billion users and generate billions of views. Since YouTube data is getting created in a very huge amount and with an equally great speed, there is a huge demand to store, process and carefully study this large amount of data to make it usable The main objective of this project is to demonstrate by using Hadoop concepts, how data generated from YouTube can be mined and utilized to make targeted, real time and informed decision

Dataset description:

Column1: Video id of 11 characters.

Column2: uploader of the video of string data type.

Column3: Interval between day of establishment of Youtube and the date of uploading of the video of integer data type.

Column4: Category of the video of String data type.

Column5: Length of the video of integer data type.

Column6: Number of views for the video of integer data type.

Column7: Rating on the video of float data type.

Column8: Number of ratings given on the video.

Column9: Number of comments on the videos in integer data type.

Column10: Related video ids with the uploaded video.

Problem statement:

1) Find out the top 5 categories with maximum number of videos uploaded. 2) Find out the top 10 rated videos. 3) Find out the most viewed videos.