Requirements
AG2 requiresPython>=3.9. To run this notebook example, please install
with the [blendsearch] option:
Note: For code corresponding to version <0.2, you can refer to the repository
Set your API Endpoint
- The
config_list_openai_aoaifunction tries to create a list of configurations using Azure OpenAI endpoints and OpenAI endpoints. It assumes the api keys and api bases are stored in the corresponding environment variables or local txt files:- OpenAI API key: os.environ[“OPENAI_API_KEY”] or
openai_api_key_file="key_openai.txt". - Azure OpenAI API key: os.environ[“AZURE_OPENAI_API_KEY”] or
aoai_api_key_file="key_aoai.txt". Multiple keys can be stored, one per line. - Azure OpenAI API base: os.environ[“AZURE_OPENAI_API_BASE”] or
aoai_api_base_file="base_aoai.txt". Multiple bases can be stored, one per line.
- OpenAI API key: os.environ[“OPENAI_API_KEY”] or
- The
config_list_from_jsonfunction loads a list of configurations from an environment variable or a json file. It first looks for the environment variableenv_or_file, which must be a valid json string. If that variable is not found, it looks for a json file with the same name. It filters the configs by filter_dict.
Load dataset
First, we load the humaneval dataset. The dataset contains 164 examples. We use the first 20 for tuning the generation hyperparameters and the remaining for evaluation. In each example, the “prompt” is the prompt string for eliciting the code generation (renamed into “definition”), “test” is the Python code for unit test for the example, and “entry_point” is the function name to be tested.Define Success Metric
Before we start tuning, we need to define the success metric we want to optimize. For each code generation task, we can use the model to generate multiple candidates, and then select one from them. If the final selected response can pass a unit test, we consider the task as successfully solved. Then we can define the mean success rate of a collection of tasks.Use the tuning data to find a good configuration
AG2 has provided an API for hyperparameter optimization of OpenAI models:autogen.Completion.tune and to make a request with the tuned
config: autogen.Completion.create.
For (local) reproducibility and cost efficiency, we cache responses from
OpenAI with a controllable seed.
cache_path_root from “.cache” to a different path in set_cache().
The cache for different seeds are stored separately.
Perform tuning
The tuning will take a while to finish, depending on the optimization budget. The tuning will be performed under the specified optimization budgets.inference_budgetis the target average inference budget per instance in the benchmark. For example, 0.02 means the target inference budget is 0.02 dollars, which translates to 1000 tokens (input + output combined) if the text Davinci model is used.optimization_budgetis the total budget allowed to perform the tuning. For example, 5 means 5 dollars are allowed in total, which translates to 250K tokens for the text Davinci model.num_sumplesis the number of different hyperparameter configurations allowed to be tried. The tuning will stop after either num_samples trials or after optimization_budget dollars spent, whichever happens first. -1 means no hard restriction in the number of trials and the actual number is decided byoptimization_budget.
Output tuning results
After the tuning, we can print out the config and the result found by autogen:Request with the tuned config
We can apply the tuned config on the request for an example task:Evaluate the success rate on the test data
You can useautogen.Completion.test to evaluate the performance of an
entire dataset with the tuned config. The following code will take a
while to evaluate all the 144 test data instances. The cost is about $6
if you uncomment it and run it.