It also consists of metadata of movies like genre, language, release year and cast. The audio present in film clips typically consists of a mixture of speech, music, and sound results meant to have interaction the viewers in the stories that filmmakers need to deliver. Actually, the extended Cognimuse dataset consists of famous Hollywood film clips, and hence, speech, sound results in addition to music...