Parameter Configuration Description¶

1. Configuration instructions¶

The parameter combination of few-shot and cot can produce four evaluation methods:¶

few-shot is False cot is False: that is, zero-shot adopts the method of only answering the answer.
few-shot is True and cot is False: that is, few-shot adopts the method of only answering the answer.
few-shot is False and cot is True: it means that the zero-shot method uses the CoT method to answer.
few-shot is True and cot is True: the few-shot method uses the CoT method to answer.

few-shot or zero-shot?¶

Generally speaking, the effect of the few-shot model in the pretraining stage will always be better than zero-shot, but the model after instruction tuning, and if the instruction tuning does not have few-shot data, it is likely that zero-shot will be better.

Different model_types represent different model model reading configurations. For model_type, please choose from the following models:¶

```
"bloom": (BloomForCausalLM, BloomTokenizerFast),
"chatglm": (AutoModel, AutoTokenizer),
"llama": (LlamaForCausalLM, LlamaTokenizer),
"baichuan": (AutoModelForCausalLM, AutoTokenizer),
"auto": (AutoModelForCausalLM, AutoTokenizer),
"moss":(AutoConfig, AutoTokenizer)
```

2. Model configuration information¶

The following is the model configuration information:

--model_type model name
--model_path model path
--cot  Whether to use Chain-of-thought
--few_shot  Whether to use few-shot learning
--with_prompt  Whether to use the prompt template of alpaca, the default is not applicable
--ntrain The number of few-shot, if few-shot is False, this parameter is invalid
--constrained_decoding Whether to use the restricted decoding method, since the evaluation standard answer of fineval is ABCD, two answer schemes extracted from the model are provided: when constrained_decoding=True, calculate the probability that the first token generated by the model is ABCD, and select the probability The largest as an answer; when constrained_decoding=False, use a regular expression to extract the answer from the model-generated content.
--temperature Temperature for model decoding
--n_times Specify the number of repetitions of the evaluation, put the model under output_dir to generate a folder with the specified number of times, the default is 1, and the generated folder is toke0
--do_save_csv Whether to save the model generation results, extracted answers, etc. in the csv file
--do_test Test on the valid and test sets, when do_test=False, test on the valid set; when do_test=True, test on the test set
--gpus The number of gpus used in model testing
--only_cpu True Whether to use only cpu for evaluation
--output_dir Specify the output path of the evaluation results