I love this section, for most popular model's principles are explained here (models in NLP).

We explain the text decoder principle by MindNLP.

auto-regression  Language model: 

the method supplied by MindNLP:

Greedy_search:

Beam search:

keep several top probable predictions in each step: (num_beams = 2)

tokenizer = GPT2Tokenizer.from_pretrained('iiBcai/gpt2',mirror = 'modelscope')
model = GPT2MHeadModel.from_pretrained('iiBcai/gpt2',pad_token_id = tokenizer.eos_token_id, mirror = 'modelscope')
input_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors = 'ms')
beam_output = model.generate(
    input_ids, 
    max_length=50,
    num_beams = 5,
    early_stopping = True
)
print('Output:\n ' + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens  = True))
print(100 * '-')

beam_output = model.generate(
    input_ids,
    max_length = 50,
    num_beams =5,
    no_repeat_ngram_size = 2,
    early_stopping = True
)
print('Beam search with ngram, Output:\n' + 100* '-')
print(tokenizer.decode(beam_output[0], skip_special_token = True))
print(100*'-')
beam_output  = model.generate(
    input_ids,
    max_length = 50,
    num_beams = 5,
    no_repeat_ngram_size = 2,
    num_return_sequences = 5,
    early_stopping      =True
)

print('return_num_sequences, Output :\n' + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
    print("{}:{}".format(i, tokenizer.decode(beam_output, skip_special_tokens = True)))
print(100 * '-')

good output:

n-gram:an n-gram model predicts the occurrence of a word based on the occurrence of its previous n - 1 words, making it a type of Markov model.

no_repeat_ngram_size :set those words to have probability 0 to appear  more than n times

sample:

randomly choose output word according to the conditional distribution now

temperature:if high you can imagine that all hopeful output waiting to be choose look less different(i.e choose more randomly), otherwise they varies in probablity to be choose to a greater extent

mindspore.set_seed(1234)
sample_output = model.generate(
    input_ids,
    do_sample = True,
    max_length = 50,
    top_k = 0,
    temperature= 0.7
    )

TopK sample : choose those topK words and normalize them to choose the sample again

Top-P sample:

choose those words whose probablity > p

and normalize them to sample again

combine topk and topp

sample_outputs = model.generate(
    input_ids,
    do_samples = True,
    max_length = 50,
    top_k = 5,
    top_p = 0.95,
    num_return_sequences = 3
    )
print('Output:\n' + 100*'-')
for i, sample_output in enumerate(sample_outputs):
    print("{} : {}".format(i, tokenizer.decode(sample_output,skip_special_tokens = True)))

Logo

鲲鹏昇腾开发者社区是面向全社会开放的“联接全球计算开发者,聚合华为+生态”的社区,内容涵盖鲲鹏、昇腾资源,帮助开发者快速获取所需的知识、经验、软件、工具、算力,支撑开发者易学、好用、成功,成为核心开发者。

更多推荐