Use AG2 to Tune OpenAI Models
Python>=3.9
. To run this notebook example, please install
with the [blendsearch] option:
Note: For code corresponding to version <0.2, you can refer to the repository
config_list_openai_aoai
function tries to create a list of configurations using Azure OpenAI
endpoints and OpenAI endpoints. It assumes the api keys and api
bases are stored in the corresponding environment variables or local
txt files:
openai_api_key_file="key_openai.txt"
.aoai_api_key_file="key_aoai.txt"
. Multiple keys can be stored,
one per line.aoai_api_base_file="base_aoai.txt"
. Multiple bases can be
stored, one per line.config_list_from_json
function loads a list of configurations from an environment variable
or a json file. It first looks for the environment variable
env_or_file
, which must be a valid json string. If that variable
is not found, it looks for a json file with the same name. It
filters the configs by filter_dict.autogen.Completion.tune
and to make a request with the tuned
config: autogen.Completion.create
.
For (local) reproducibility and cost efficiency, we cache responses from
OpenAI with a controllable seed.
cache_path_root
from “.cache” to a different path in set_cache()
.
The cache for different seeds are stored separately.
inference_budget
is the target average inference budget per
instance in the benchmark. For example, 0.02 means the target
inference budget is 0.02 dollars, which translates to 1000 tokens
(input + output combined) if the text Davinci model is used.optimization_budget
is the total budget allowed to perform the
tuning. For example, 5 means 5 dollars are allowed in total, which
translates to 250K tokens for the text Davinci model.num_sumples
is the number of different hyperparameter
configurations allowed to be tried. The tuning will stop after
either num_samples trials or after optimization_budget dollars
spent, whichever happens first. -1 means no hard restriction in the
number of trials and the actual number is decided by
optimization_budget
.autogen.Completion.test
to evaluate the performance of an
entire dataset with the tuned config. The following code will take a
while to evaluate all the 144 test data instances. The cost is about $6
if you uncomment it and run it.