Applying PLSI - Australia Assignments

Consider applying PLSI to the following corpus (each line is a separate document0:

ABACAA BCABABB CACBAB

furthermore, assume that there are two topics, and ABC are the only types that are available.

Now suppose we initially assigned words to topics as above (black for topic 1, red/underline for topic 2). Calculate the topic-word vectors and document-topic vectors.

Use the vectors generated in part (a) to calculate the topic probability for each word in the corpus

Use the result of (b) to recalculate the topic-word vectors and document-topic vectors.

Calculate whether the vectors in (c) is better for the set of documents.

Now consider this corpus (each line is a separate sentence): ABCCC

ADBB CDADD CABB DACB

Suppose we want to build a bigram model based on the corpus above. Assume we have both a begin and end sentence symbol for each sentence.

Calculate the perplexity of each sentence (separately) for each of the two cases

The base case (no smoothing)

Using Laplace (plus 1) smoothing.

Also show the probabilities for each bigram (preferably in a 2-d matrix).