Abstract— of the program and the interests of

Abstract— Nowadays YouTube programs attain more publicity and
people are much addicted on these programs. It not only attracts audience but
also many advertisers, so in order to help them find the most popular program
and the channel name we use a new method. The accurate and sensible forecast
about a program’s popularity provides great value for people like content
providers, advertisers, and broadcast TV operators etc. This information can be
useful to advertisers to make profitable investment plans. There are quite a
lot of prediction models that are commonly used to predict program popularity.
But these methods require abundant samples, extensive training and has poor prediction
accuracy.  An improved prediction
approach is proposed and it uses the K-medoids algorithm
for clustering the data into four trends and then it is given as input to
gradient boosting decision tree and in extreme gradient boosting algorithm.
Finally by using all these data we check which one gives better prediction

YouTube, predicting names, k-medoids algorithm, random forest regression.

I.      Introduction

As the fame of new
technologies like 3D technology increase, people get more addicted to internet
videos, so it attracted all the broadcast TV channels to publish their programs
in channels like YouTube etc. It is now becoming an emerging trend to telecast
TV programs in internet to increase their popularity. According to the modern explorations
the internet streaming of broadcast TV programs will continue to grow at a
rapid pace. All the programs do not get equal response. Only a few programs can
gain enormous user attention the remaining programs are left without anybody to
watch them.


In this perspective, it is
of great importance to forecast the popularity of these programs. Using the
program popularity prediction results, the audience will save much time when
trying to discover valuable programs among massive collections of video
resources, which will improve user satisfaction. Based on program popularity
data, a company will be able to maximize its marketing effect by choosing the
programs with highest potential.


However, accurately
predicting the popularity of broadcast TV programs, quality of the program and
the interests of the audience is a difficult task. Last, there is a massive gap
between the popularity evolutionary trends of different programs, which should
be considered when designing the prediction model 5.An enhanced method to
predict the program popularity among YouTube programs has been proposed in this
paper. The main aspects of our work on popularity prediction are as follows:

First, we use K-medoids
algorithm to cluster programs with similar popularity into 4 evolutionary trends.
This approach provides more efficient outcomes than the previous methods that
were used to delineate popularity evolutionary trends 5. Secondly, we put up
trend-specific prediction models using gradient boosting algorithm and in
extreme gradient boosting algorithm and find out which one achieves higher
overall predictive performance.












Fig-1.Flow of methodology


The program popularity prediction
began with the news articles. It formed a new way for online content
prediction. It also introduced new methods to predict news comment volume and
popularity of news articles such as those discussed in 1.Another method
similar to the previous one is discussed in 2.It uses a long linear model to
predict the data.3 observed a Poisson method can depict the popularity gained
by videos followed three popularity evolutionary trends.4 Used YouTube data
to forecast popularity of web content based on chronological information given
by early popularity measures.5 Used a new k-medoids algorithm along with
random forest regression to predict popularity content of a broadcast TV
channel it predicts the name of the program which is popular. All the previous
methods used focuses only on the general model to predict the popularity of a
program and are ineffective to predict popularity among broadcast channels, 5
is the only new method to predict popularity among broadcast TV channel


methodology includes three different algorithms. First k-medoids algorithm
finds the new evolutionary trends. For program popularity there are different
types of propagation trends. Each one has different level of features. We could
get more efficient data if they are propagated. So in order to propagate those
k-medoids is used as a replacement for of k-means clustering. It is said that
if the cluster groups are more than four it will not provide accurate
prediction model. So this paper uses four clusters and is used in Gradient
boosting algorithm to find the predictions.

  Finally Extreme
gradient boosting algorithm is used with the same input used in gradient
boosting to predict the popularity of programs and check which one provides
better predictions.




section describes the k-medoids clustering of program popularity into four
trends 5. In k-means clustering the center of the cluster represents the mean
of the members in the cluster. Whereas, in k-medoids the center of the clusters
are the medians of the cluster members. The k-medoids thus gives efficient
results than k-means. The other steps of k-medoids algorithm are the similar to
that of k-means. The clusters gained are more important and are used in
different algorithms to predict the popularity.




section describes about the usage of Gradient boosting algorithm for providing
trend specific prediction models. It produces more accurate values than the
other prediction algorithms. The decision trees are perceptive to the data on
which they are trained 5.The other algorithms have high structural
similarities but in Gradient boosting the trees are unique. It is specified
that stable results for estimating variable importance are achieved with a
higher value 5.




section discusses about extreme gradient boosting algorithm. This algorithm is
built based on the principles of gradient boosting. The difference between
gradient boosting and extreme gradient boosting is that it produces a    regularized model formulation to manage
over-fitting, which produces a better performance. A solitary decision tree can have over fitting which is overcome by
gradient boosting algorithm by combining hundreds of trees each containing some
leaf nodes 5. The extreme gradient boosting model gives better forecast presentation
when compared with other models and it also has a great speed. It is ten times
faster than other algorithms. The decision trees are built to predict new
popularity trends. Thus it produces an efficient result on predicting
popularity among YouTube videos.





data used in this paper is YouTube trending videos dataset. It includes several
features like title of the program, channel name, views, likes, dislikes etc.
The summary of dataset is given below in table1.

































Table1: Summary of dataset




We use R-studio to implement the required k-medoids
clustering, gradient boosting and extreme gradient boosting algorithms in this
First the k-medoids algorithm is used to split the data into 4 clusters.


Fig-2.k-medoids clusters


The clusters gained from
the above algorithm are used in Gradient boosting and the predictions are made
according to these clusters.

Then the clusters are used
for Extreme gradient boosting model and predict the popularity for all the
programs. Compared with gradient boosting algorithm the extreme gradient
boosting algorithm gives better results.




In this paper we have predicted the
popularity for programs according to their publish time. We used K-Medoids
algorithm to cluster programs into 4 trends, which has the capability to detain
the program popularity. Furthermore, Gradient boosting is used to forecast
popularity. Then Extreme gradient boosting is used to predict the results. It
gives more accurate prediction results than the generally used gradient
boosting algorithm. The experimental results give gain in accuracy than the
methods used previously to forecast program popularity among YouTube videos. It
gives an unswerving prediction outcome much faster.




M. Tsagkias, W.
Weerkamp, and M. de Rijke, “News comments: Exploring, modeling, and online
prediction,” in Advances in Information Retrieval. Cham,
Switzerland: Springer, 2010, pp. 191_203.

2 G. Szabo and B. A.
Huberman, “Predicting the popularity of online content,” Commun. ACM,
vol. 53, no. 8, pp. 80_88, 2010.

3 R. Crane and D.
Sornette, “Robust dynamic classes revealed by measuring the response function
of a social system,” Proc. Nat. Acad. Sci. USA,vol. 105, no. 41, pp.
15649_15653, 2008.

4 H. Pinto, J. M.
Almeida, and M. A. Gonç_alves, “Using early view patterns to predict the
popularity of YouTube videos,” in Proc. 6th ACM Int. Conf.Web Search Data
Mining, 2013, pp. 365_374.

5 Chengang zhu, Guang
cheng, (Senior Member, IEEE), and kun wang 2,3, (Senior Member, IEEE)” Big Data
Analytics for Program Popularity Prediction in Broadcast TV