Open In Colab Open on GitHub

In GroupChat, we can customize the speaker selection by passing a function to the GroupChat object. With this function, you can build a more deterministic agent workflow. We recommend following a StateFlow pattern when crafting this function. Please refer to the StateFlow blog for more details.

An example research workflow

We provide a simple example to build a StateFlow model for research with customized speaker selection.

We first define the following agents:

  • Initializer: Start the workflow by sending a task.
  • Coder: Retrieve papers from the internet by writing code.
  • Executor: Execute the code.
  • Scientist: Read the papers and write a summary.

In the figure above, we define a simple workflow for research with 4 states: Init, Retrieve, Research, and End. Within each state, we will call different agents to perform the tasks.

  • Init: We use the initializer to start the workflow.
  • Retrieve: We will first call the coder to write code and then call the executor to execute the code.
  • Research: We will call the scientist to read the papers and write a summary.
  • End: We will end the workflow.

Create your speaker selection function

Below is a skeleton of the speaker selection function. Fill in the function to define the speaker selection logic.

def custom_speaker_selection_func(
    last_speaker: Agent, 
    groupchat: GroupChat
) -> Union[Agent, Literal['auto', 'manual', 'random' 'round_robin'], None]:

    """Define a customized speaker selection function.
    A recommended way is to define a transition for each speaker in the groupchat.

    Parameters:
        - last_speaker: Agent
            The last speaker in the group chat.
        - groupchat: GroupChat
            The GroupChat object
    Return:
        Return one of the following:
        1. an `Agent` class, it must be one of the agents in the group chat.
        2. a string from ['auto', 'manual', 'random', 'round_robin'] to select a default method to use.
        3. None, which indicates the chat should be terminated.
    """
    pass

groupchat = autogen.GroupChat(
    speaker_selection_method=custom_speaker_selection_func,
    ...,
)

The last speaker and the groupchat object are passed to the function. Commonly used variables from groupchat are groupchat.messages and groupchat.agents, which is the message history and the agents in the group chat respectively. You can access other attributes of the groupchat, such as groupchat.allowed_speaker_transitions_dict for pre-defined allowed_speaker_transitions_dict.

import os

import autogen

# Put your api key in the environment variable OPENAI_API_KEY
config_list = [
    {
        "model": "gpt-4-0125-preview",
        "api_key": os.environ["OPENAI_API_KEY"],
    }
]

# You can also create an file called "OAI_CONFIG_LIST" and store your config there
# config_list = autogen.config_list_from_json(
#     "OAI_CONFIG_LIST",
#     filter_dict={
#         "model": ["gpt-4-0125-preview"],
#     },
# )
gpt4_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0,
    "config_list": config_list,
    "timeout": 120,
}

initializer = autogen.UserProxyAgent(
    name="Init",
)

coder = autogen.AssistantAgent(
    name="Retrieve_Action_1",
    llm_config=gpt4_config,
    system_message="""You are the Coder. Given a topic, write code to retrieve related papers from the arXiv API, print their title, authors, abstract, and link.
You write python/shell code to solve tasks. Wrap the code in a code block that specifies the script type. The user can't modify your code. So do not suggest incomplete code which requires others to modify. Don't use a code block if it's not intended to be executed by the executor.
Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. Check the execution result returned by the executor.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
""",
)
executor = autogen.UserProxyAgent(
    name="Retrieve_Action_2",
    system_message="Executor. Execute the code written by the Coder and report the result.",
    human_input_mode="NEVER",
    code_execution_config={
        "last_n_messages": 3,
        "work_dir": "paper",
        "use_docker": False,
    },  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.
)
scientist = autogen.AssistantAgent(
    name="Research_Action_1",
    llm_config=gpt4_config,
    system_message="""You are the Scientist. Please categorize papers after seeing their abstracts printed and create a markdown table with Domain, Title, Authors, Summary and Link""",
)


def state_transition(last_speaker, groupchat):
    messages = groupchat.messages

    if last_speaker is initializer:
        # init -> retrieve
        return coder
    elif last_speaker is coder:
        # retrieve: action 1 -> action 2
        return executor
    elif last_speaker is executor:
        if messages[-1]["content"] == "exitcode: 1":
            # retrieve --(execution failed)--> retrieve
            return coder
        else:
            # retrieve --(execution success)--> research
            return scientist
    elif last_speaker == "Scientist":
        # research -> end
        return None


groupchat = autogen.GroupChat(
    agents=[initializer, coder, executor, scientist],
    messages=[],
    max_round=20,
    speaker_selection_method=state_transition,
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=gpt4_config)
initializer.initiate_chat(
    manager, message="Topic: LLM applications papers from last week. Requirement: 5 - 10 papers from different domains."
)
Init (to chat_manager):

Topic: LLM applications papers from last week. Requirement: 5 - 10 papers from different domains.

--------------------------------------------------------------------------------
Retrieve_Action_1 (to chat_manager):

To retrieve related papers from the arXiv API, we can use Python with the `requests` library to send a query to the API and parse the response. Below is a Python script that searches for papers related to "LLM applications" (Large Language Models applications) from the last week, across different domains, and prints out the required information for 5 to 10 papers.

```python
import requests
from datetime import datetime, timedelta
import feedparser

# Define the base URL for the arXiv API
ARXIV_API_URL = 'http://export.arxiv.org/api/query?'

# Define the search parameters
search_query = 'all:"LLM applications"'
start_date = (datetime.now() - timedelta(days=7)).strftime('%Y%m%d%H%M%S')
end_date = datetime.now().strftime('%Y%m%d%H%M%S')
start = 0
max_results = 10
sort_by = 'submittedDate'
sort_order = 'descending'

# Construct the query
query = f'search_query={search_query}&sortBy={sort_by}&sortOrder={sort_order}&start={start}&max_results={max_results}'

# Send the request to the arXiv API
response = requests.get(ARXIV_API_URL + query)

# Parse the response using feedparser
feed = feedparser.parse(response.content)

# Print the title, authors, abstract, and link of each paper
for entry in feed.entries:
    print("Title:", entry.title)
    print("Authors:", ', '.join(author.name for author in entry.authors))
    print("Abstract:", entry.summary)
    print("Link:", entry.link)
    print("\n")

# Check if we have at least 5 papers, if not, adjust the search or notify
if len(feed.entries) < 5:
    print("Less than 5 papers found. Consider adjusting the search parameters or timeframe.")
```

This script will print the title, authors, abstract, and link for each paper related to "LLM applications" from the last week, up to a maximum of 10 papers. If fewer than 5 papers are found, it will notify the user to consider adjusting the search parameters or timeframe.

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
Retrieve_Action_2 (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
Title: PRSA: Prompt Reverse Stealing Attacks against Large Language Models
Authors: Yong Yang, Xuhong Zhang, Yi Jiang, Xi Chen, Haoyu Wang, Shouling Ji, Zonghui Wang
Abstract: Prompt, recognized as crucial intellectual property, enables large language
models (LLMs) to perform specific tasks without the need of fine-tuning,
underscoring their escalating importance. With the rise of prompt-based
services, such as prompt marketplaces and LLM applications, providers often
display prompts' capabilities through input-output examples to attract users.
However, this paradigm raises a pivotal security concern: does the exposure of
input-output pairs pose the risk of potential prompt leakage, infringing on the
intellectual property rights of the developers? To our knowledge, this problem
still has not been comprehensively explored yet. To remedy this gap, in this
paper, we perform the first in depth exploration and propose a novel attack
framework for reverse-stealing prompts against commercial LLMs, namely PRSA.
The main idea of PRSA is that by analyzing the critical features of the
input-output pairs, we mimic and gradually infer (steal) the target prompts. In
detail, PRSA mainly consists of two key phases: prompt mutation and prompt
pruning. In the mutation phase, we propose a prompt attention algorithm based
on differential feedback to capture these critical features for effectively
inferring the target prompts. In the prompt pruning phase, we identify and mask
the words dependent on specific inputs, enabling the prompts to accommodate
diverse inputs for generalization. Through extensive evaluation, we verify that
PRSA poses a severe threat in real world scenarios. We have reported these
findings to prompt service providers and actively collaborate with them to take
protective measures for prompt copyright.
Link: http://arxiv.org/abs/2402.19200v1


Title: Political Compass or Spinning Arrow? Towards More Meaningful Evaluations
  for Values and Opinions in Large Language Models
Authors: Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy
Abstract: Much recent work seeks to evaluate values and opinions in large language
models (LLMs) using multiple-choice surveys and questionnaires. Most of this
work is motivated by concerns around real-world LLM applications. For example,
politically-biased LLMs may subtly influence society when they are used by
millions of people. Such real-world concerns, however, stand in stark contrast
to the artificiality of current evaluations: real users do not typically ask
LLMs survey questions. Motivated by this discrepancy, we challenge the
prevailing constrained evaluation paradigm for values and opinions in LLMs and
explore more realistic unconstrained evaluations. As a case study, we focus on
the popular Political Compass Test (PCT). In a systematic review, we find that
most prior work using the PCT forces models to comply with the PCT's
multiple-choice format. We show that models give substantively different
answers when not forced; that answers change depending on how models are
forced; and that answers lack paraphrase robustness. Then, we demonstrate that
models give different answers yet again in a more realistic open-ended answer
setting. We distill these findings into recommendations and open challenges in
evaluating values and opinions in LLMs.
Link: http://arxiv.org/abs/2402.16786v1


Title: Large Language Models as Urban Residents: An LLM Agent Framework for
  Personal Mobility Generation
Authors: Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Chuan Xiao
Abstract: This paper introduces a novel approach using Large Language Models (LLMs)
integrated into an agent framework for flexible and efficient personal mobility
generation. LLMs overcome the limitations of previous models by efficiently
processing semantic data and offering versatility in modeling various tasks.
Our approach addresses the critical need to align LLMs with real-world urban
mobility data, focusing on three research questions: aligning LLMs with rich
activity data, developing reliable activity generation strategies, and
exploring LLM applications in urban mobility. The key technical contribution is
a novel LLM agent framework that accounts for individual activity patterns and
motivations, including a self-consistency approach to align LLMs with
real-world activity data and a retrieval-augmented strategy for interpretable
activity generation. In experimental studies, comprehensive validation is
performed using real-world data. This research marks the pioneering work of
designing an LLM agent framework for activity generation based on real-world
human activity data, offering a promising tool for urban mobility analysis.
Link: http://arxiv.org/abs/2402.14744v1


Title: An Evaluation of Large Language Models in Bioinformatics Research
Authors: Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun
Abstract: Large language models (LLMs) such as ChatGPT have gained considerable
interest across diverse research communities. Their notable ability for text
completion and generation has inaugurated a novel paradigm for
language-interfaced problem solving. However, the potential and efficacy of
these models in bioinformatics remain incompletely explored. In this work, we
study the performance LLMs on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction
of named entities for genes and proteins, detection of antimicrobial and
anti-cancer peptides, molecular optimization, and resolution of educational
bioinformatics problems. Our findings indicate that, given appropriate prompts,
LLMs like GPT variants can successfully handle most of these tasks. In
addition, we provide a thorough analysis of their limitations in the context of
complicated bioinformatics tasks. In conclusion, we believe that this work can
provide new perspectives and motivate future research in the field of LLMs
applications, AI for Science and bioinformatics.
Link: http://arxiv.org/abs/2402.13714v1


Title: Privacy-Preserving Instructions for Aligning Large Language Models
Authors: Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu
Abstract: Service providers of large language model (LLM) applications collect user
instructions in the wild and use them in further aligning LLMs with users'
intentions. These instructions, which potentially contain sensitive
information, are annotated by human workers in the process. This poses a new
privacy risk not addressed by the typical private optimization. To this end, we
propose using synthetic instructions to replace real instructions in data
annotation and model fine-tuning. Formal differential privacy is guaranteed by
generating those synthetic instructions using privately fine-tuned generators.
Crucial in achieving the desired utility is our novel filtering algorithm that
matches the distribution of the synthetic instructions to that of the real
ones. In both supervised fine-tuning and reinforcement learning from human
feedback, our extensive experiments demonstrate the high utility of the final
set of synthetic instructions by showing comparable results to real
instructions. In supervised fine-tuning, models trained with private synthetic
instructions outperform leading open-source models such as Vicuna.
Link: http://arxiv.org/abs/2402.13659v1


Title: Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in
  Conversations with the Tabletop Robot Haru
Authors: Zining Wang, Paul Reisert, Eric Nichols, Randy Gomez
Abstract: Social robots aim to establish long-term bonds with humans through engaging
conversation. However, traditional conversational approaches, reliant on
scripted interactions, often fall short in maintaining engaging conversations.
This paper addresses this limitation by integrating large language models
(LLMs) into social robots to achieve more dynamic and expressive conversations.
We introduce a fully-automated conversation system that leverages LLMs to
generate robot responses with expressive behaviors, congruent with the robot's
personality. We incorporate robot behavior with two modalities: 1) a
text-to-speech (TTS) engine capable of various delivery styles, and 2) a
library of physical actions for the robot. We develop a custom,
state-of-the-art emotion recognition model to dynamically select the robot's
tone of voice and utilize emojis from LLM output as cues for generating robot
actions. A demo of our system is available here. To illuminate design and
implementation issues, we conduct a pilot study where volunteers chat with a
social robot using our proposed system, and we analyze their feedback,
conducting a rigorous error analysis of chat transcripts. Feedback was
overwhelmingly positive, with participants commenting on the robot's empathy,
helpfulness, naturalness, and entertainment. Most negative feedback was due to
automatic speech recognition (ASR) errors which had limited impact on
conversations. However, we observed a small class of errors, such as the LLM
repeating itself or hallucinating fictitious information and human responses,
that have the potential to derail conversations, raising important issues for
LLM application.
Link: http://arxiv.org/abs/2402.11571v1


Title: Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots
  in Ophthalmology and LLM-based evaluation using GPT-4
Authors: Ting Fang Tan, Kabilan Elangovan, Liyuan Jin, Yao Jie, Li Yong, Joshua Lim, Stanley Poh, Wei Yan Ng, Daniel Lim, Yuhe Ke, Nan Liu, Daniel Shu Wei Ting
Abstract: Purpose: To assess the alignment of GPT-4-based evaluation to human clinician
experts, for the evaluation of responses to ophthalmology-related patient
queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology
questions and paired answers were created by ophthalmologists to represent
commonly asked patient questions, divided into fine-tuning (368; 92%), and
testing (40; 8%). We find-tuned 5 different LLMs, including LLAMA2-7b,
LLAMA2-7b-Chat, LLAMA2-13b, and LLAMA2-13b-Chat. For the testing dataset,
additional 8 glaucoma QnA pairs were included. 200 responses to the testing
dataset were generated by 5 fine-tuned LLMs for evaluation. A customized
clinical evaluation rubric was used to guide GPT-4 evaluation, grounded on
clinical accuracy, relevance, patient safety, and ease of understanding. GPT-4
evaluation was then compared against ranking by 5 clinicians for clinical
alignment. Results: Among all fine-tuned LLMs, GPT-3.5 scored the highest
(87.1%), followed by LLAMA2-13b (80.9%), LLAMA2-13b-chat (75.5%),
LLAMA2-7b-Chat (70%) and LLAMA2-7b (68.8%) based on the GPT-4 evaluation. GPT-4
evaluation demonstrated significant agreement with human clinician rankings,
with Spearman and Kendall Tau correlation coefficients of 0.90 and 0.80
respectively; while correlation based on Cohen Kappa was more modest at 0.50.
Notably, qualitative analysis and the glaucoma sub-analysis revealed clinical
inaccuracies in the LLM-generated responses, which were appropriately
identified by the GPT-4 evaluation. Conclusion: The notable clinical alignment
of GPT-4 evaluation highlighted its potential to streamline the clinical
evaluation of LLM chatbot responses to healthcare-related queries. By
complementing the existing clinician-dependent manual grading, this efficient
and automated evaluation could assist the validation of future developments in
LLM applications for healthcare.
Link: http://arxiv.org/abs/2402.10083v1


Title: Unmemorization in Large Language Models via Self-Distillation and
  Deliberate Imagination
Authors: Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vulić
Abstract: While displaying impressive generation capabilities across many tasks, Large
Language Models (LLMs) still struggle with crucial issues of privacy violation
and unwanted exposure of sensitive data. This raises an essential question: how
should we prevent such undesired behavior of LLMs while maintaining their
strong generation and natural language understanding (NLU) capabilities? In
this work, we introduce a novel approach termed deliberate imagination in the
context of LLM unlearning. Instead of trying to forget memorized data, we
employ a self-distillation framework, guiding LLMs to deliberately imagine
alternative scenarios. As demonstrated in a wide range of experiments, the
proposed method not only effectively unlearns targeted text but also preserves
the LLMs' capabilities in open-ended generation tasks as well as in NLU tasks.
Our results demonstrate the usefulness of this approach across different models
and sizes, and also with parameter-efficient fine-tuning, offering a novel
pathway to addressing the challenges with private and sensitive data in LLM
applications.
Link: http://arxiv.org/abs/2402.10052v1


Title: Anchor-based Large Language Models
Authors: Jianhui Pang, Fanghua Ye, Derek F. Wong, Longyue Wang
Abstract: Large language models (LLMs) predominantly employ decoder-only transformer
architectures, necessitating the retention of keys/values information for
historical tokens to provide contextual information and avoid redundant
computation. However, the substantial size and parameter volume of these LLMs
require massive GPU memory. This memory demand increases with the length of the
input text, leading to an urgent need for more efficient methods of information
storage and processing. This study introduces Anchor-based LLMs (AnLLMs), which
utilize an innovative anchor-based self-attention network (AnSAN) and also an
anchor-based inference strategy. This approach enables LLMs to compress
sequence information into an anchor token, reducing the keys/values cache and
enhancing inference efficiency. Experiments on question-answering benchmarks
reveal that AnLLMs maintain similar accuracy levels while achieving up to 99%
keys/values cache reduction and up to 3.5 times faster inference. Despite a
minor compromise in accuracy, the substantial enhancements of AnLLMs employing
the AnSAN technique in resource utilization and computational efficiency
underscore their potential for practical LLM applications.
Link: http://arxiv.org/abs/2402.07616v2


Title: T-RAG: Lessons from the LLM Trenches
Authors: Masoomali Fatehkia, Ji Kim Lucas, Sanjay Chawla
Abstract: Large Language Models (LLM) have shown remarkable language capabilities
fueling attempts to integrate them into applications across a wide range of
domains. An important application area is question answering over private
enterprise documents where the main considerations are data security, which
necessitates applications that can be deployed on-prem, limited computational
resources and the need for a robust application that correctly responds to
queries. Retrieval-Augmented Generation (RAG) has emerged as the most prominent
framework for building LLM-based applications. While building a RAG is
relatively straightforward, making it robust and a reliable application
requires extensive customization and relatively deep knowledge of the
application domain. We share our experiences building and deploying an LLM
application for question answering over private organizational documents. Our
application combines the use of RAG with a finetuned open-source LLM.
Additionally, our system, which we call Tree-RAG (T-RAG), uses a tree structure
to represent entity hierarchies within the organization. This is used to
generate a textual description to augment the context when responding to user
queries pertaining to entities within the organization's hierarchy. Our
evaluations show that this combination performs better than a simple RAG or
finetuning implementation. Finally, we share some lessons learned based on our
experiences building an LLM application for real-world use.
Link: http://arxiv.org/abs/2402.07483v1




--------------------------------------------------------------------------------
Research_Action_1 (to chat_manager):

Based on the retrieved abstracts, here is a markdown table categorizing the papers by domain, along with their titles, authors, summaries, and links:

| Domain | Title | Authors | Summary | Link |
|--------|-------|---------|---------|------|
| Security | PRSA: Prompt Reverse Stealing Attacks against Large Language Models | Yong Yang, Xuhong Zhang, Yi Jiang, Xi Chen, Haoyu Wang, Shouling Ji, Zonghui Wang | The paper explores the security risks associated with exposing input-output pairs of prompts used in LLMs and proposes a novel attack framework, PRSA, to reverse-steal prompts, posing a threat to intellectual property rights. | [Link](http://arxiv.org/abs/2402.19200v1) |
| Ethics & Evaluation | Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models | Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy | This work challenges the constrained evaluation paradigm for values and opinions in LLMs and explores more realistic unconstrained evaluations, focusing on the Political Compass Test (PCT). | [Link](http://arxiv.org/abs/2402.16786v1) |
| Urban Mobility | Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Chuan Xiao | Introduces an LLM agent framework for personal mobility generation, aligning LLMs with real-world urban mobility data, and offering a tool for urban mobility analysis. | [Link](http://arxiv.org/abs/2402.14744v1) |
| Bioinformatics | An Evaluation of Large Language Models in Bioinformatics Research | Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun | Evaluates the performance of LLMs on bioinformatics tasks, highlighting their potential and limitations, and motivating future research in LLM applications in bioinformatics. | [Link](http://arxiv.org/abs/2402.13714v1) |
| Privacy | Privacy-Preserving Instructions for Aligning Large Language Models | Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu | Proposes using synthetic instructions generated by privately fine-tuned generators to replace real instructions in data annotation and model fine-tuning, ensuring privacy while maintaining utility. | [Link](http://arxiv.org/abs/2402.13659v1) |
| Social Robotics | Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru | Zining Wang, Paul Reisert, Eric Nichols, Randy Gomez | Integrates LLMs into social robots to generate dynamic and expressive conversations, using a text-to-speech engine and a library of physical actions for the robot. | [Link](http://arxiv.org/abs/2402.11571v1) |
| Ophthalmology | Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4 | Ting Fang Tan, Kabilan Elangovan, Liyuan Jin, Yao Jie, Li Yong, Joshua Lim, Stanley Poh, Wei Yan Ng, Daniel Lim, Yuhe Ke, Nan Liu, Daniel Shu Wei Ting | Assesses the alignment of GPT-4-based evaluation to human clinician experts for evaluating responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. | [Link](http://arxiv.org/abs/2402.10083v1) |
| Privacy & Data Security | Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination | Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vulić | Introduces a novel approach for LLM unlearning by guiding LLMs to imagine alternative scenarios, effectively unlearning targeted text while preserving generation and NLU capabilities. | [Link](http://arxiv.org/abs/2402.10052v1) |
| Computational Efficiency | Anchor-based Large Language Models | Jianhui Pang, Fanghua Ye, Derek F. Wong, Longyue Wang | Proposes Anchor-based LLMs (AnLLMs) with an innovative anchor-based self-attention network (AnSAN) to reduce memory demand and enhance inference efficiency. | [Link](http://arxiv.org/abs/2402.07616v2) |
| Enterprise Applications | T-RAG: Lessons from the LLM Trenches | Masoomali Fatehkia, Ji Kim Lucas, Sanjay Chawla | Shares experiences building and deploying an LLM application for question answering over private organizational documents, combining RAG with a finetuned LLM and a tree structure for entity hierarchies. | [Link](http://arxiv.org/abs/2402.07483v1) |

These papers cover a range of domains including security, ethics, urban mobility, bioinformatics, privacy, social robotics, ophthalmology, data security, computational efficiency, and enterprise applications, showcasing the diverse applications of large language models.

--------------------------------------------------------------------------------
ChatResult(chat_id=None, chat_history=[{'content': 'Topic: LLM applications papers from last week. Requirement: 5 - 10 papers from different domains.', 'role': 'assistant'}, {'content': 'To retrieve related papers from the arXiv API, we can use Python with the `requests` library to send a query to the API and parse the response. Below is a Python script that searches for papers related to "LLM applications" (Large Language Models applications) from the last week, across different domains, and prints out the required information for 5 to 10 papers.\n\n```python\nimport requests\nfrom datetime import datetime, timedelta\nimport feedparser\n\n# Define the base URL for the arXiv API\nARXIV_API_URL = \'http://export.arxiv.org/api/query?\'\n\n# Define the search parameters\nsearch_query = \'all:"LLM applications"\'\nstart_date = (datetime.now() - timedelta(days=7)).strftime(\'%Y%m%d%H%M%S\')\nend_date = datetime.now().strftime(\'%Y%m%d%H%M%S\')\nstart = 0\nmax_results = 10\nsort_by = \'submittedDate\'\nsort_order = \'descending\'\n\n# Construct the query\nquery = f\'search_query={search_query}&sortBy={sort_by}&sortOrder={sort_order}&start={start}&max_results={max_results}\'\n\n# Send the request to the arXiv API\nresponse = requests.get(ARXIV_API_URL + query)\n\n# Parse the response using feedparser\nfeed = feedparser.parse(response.content)\n\n# Print the title, authors, abstract, and link of each paper\nfor entry in feed.entries:\n    print("Title:", entry.title)\n    print("Authors:", \', \'.join(author.name for author in entry.authors))\n    print("Abstract:", entry.summary)\n    print("Link:", entry.link)\n    print("\\n")\n\n# Check if we have at least 5 papers, if not, adjust the search or notify\nif len(feed.entries) < 5:\n    print("Less than 5 papers found. Consider adjusting the search parameters or timeframe.")\n```\n\nThis script will print the title, authors, abstract, and link for each paper related to "LLM applications" from the last week, up to a maximum of 10 papers. If fewer than 5 papers are found, it will notify the user to consider adjusting the search parameters or timeframe.', 'name': 'Retrieve_Action_1', 'role': 'user'}, {'content': "exitcode: 0 (execution succeeded)\nCode output: \nTitle: PRSA: Prompt Reverse Stealing Attacks against Large Language Models\nAuthors: Yong Yang, Xuhong Zhang, Yi Jiang, Xi Chen, Haoyu Wang, Shouling Ji, Zonghui Wang\nAbstract: Prompt, recognized as crucial intellectual property, enables large language\nmodels (LLMs) to perform specific tasks without the need of fine-tuning,\nunderscoring their escalating importance. With the rise of prompt-based\nservices, such as prompt marketplaces and LLM applications, providers often\ndisplay prompts' capabilities through input-output examples to attract users.\nHowever, this paradigm raises a pivotal security concern: does the exposure of\ninput-output pairs pose the risk of potential prompt leakage, infringing on the\nintellectual property rights of the developers? To our knowledge, this problem\nstill has not been comprehensively explored yet. To remedy this gap, in this\npaper, we perform the first in depth exploration and propose a novel attack\nframework for reverse-stealing prompts against commercial LLMs, namely PRSA.\nThe main idea of PRSA is that by analyzing the critical features of the\ninput-output pairs, we mimic and gradually infer (steal) the target prompts. In\ndetail, PRSA mainly consists of two key phases: prompt mutation and prompt\npruning. In the mutation phase, we propose a prompt attention algorithm based\non differential feedback to capture these critical features for effectively\ninferring the target prompts. In the prompt pruning phase, we identify and mask\nthe words dependent on specific inputs, enabling the prompts to accommodate\ndiverse inputs for generalization. Through extensive evaluation, we verify that\nPRSA poses a severe threat in real world scenarios. We have reported these\nfindings to prompt service providers and actively collaborate with them to take\nprotective measures for prompt copyright.\nLink: http://arxiv.org/abs/2402.19200v1\n\n\nTitle: Political Compass or Spinning Arrow? Towards More Meaningful Evaluations\n  for Values and Opinions in Large Language Models\nAuthors: Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy\nAbstract: Much recent work seeks to evaluate values and opinions in large language\nmodels (LLMs) using multiple-choice surveys and questionnaires. Most of this\nwork is motivated by concerns around real-world LLM applications. For example,\npolitically-biased LLMs may subtly influence society when they are used by\nmillions of people. Such real-world concerns, however, stand in stark contrast\nto the artificiality of current evaluations: real users do not typically ask\nLLMs survey questions. Motivated by this discrepancy, we challenge the\nprevailing constrained evaluation paradigm for values and opinions in LLMs and\nexplore more realistic unconstrained evaluations. As a case study, we focus on\nthe popular Political Compass Test (PCT). In a systematic review, we find that\nmost prior work using the PCT forces models to comply with the PCT's\nmultiple-choice format. We show that models give substantively different\nanswers when not forced; that answers change depending on how models are\nforced; and that answers lack paraphrase robustness. Then, we demonstrate that\nmodels give different answers yet again in a more realistic open-ended answer\nsetting. We distill these findings into recommendations and open challenges in\nevaluating values and opinions in LLMs.\nLink: http://arxiv.org/abs/2402.16786v1\n\n\nTitle: Large Language Models as Urban Residents: An LLM Agent Framework for\n  Personal Mobility Generation\nAuthors: Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Chuan Xiao\nAbstract: This paper introduces a novel approach using Large Language Models (LLMs)\nintegrated into an agent framework for flexible and efficient personal mobility\ngeneration. LLMs overcome the limitations of previous models by efficiently\nprocessing semantic data and offering versatility in modeling various tasks.\nOur approach addresses the critical need to align LLMs with real-world urban\nmobility data, focusing on three research questions: aligning LLMs with rich\nactivity data, developing reliable activity generation strategies, and\nexploring LLM applications in urban mobility. The key technical contribution is\na novel LLM agent framework that accounts for individual activity patterns and\nmotivations, including a self-consistency approach to align LLMs with\nreal-world activity data and a retrieval-augmented strategy for interpretable\nactivity generation. In experimental studies, comprehensive validation is\nperformed using real-world data. This research marks the pioneering work of\ndesigning an LLM agent framework for activity generation based on real-world\nhuman activity data, offering a promising tool for urban mobility analysis.\nLink: http://arxiv.org/abs/2402.14744v1\n\n\nTitle: An Evaluation of Large Language Models in Bioinformatics Research\nAuthors: Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun\nAbstract: Large language models (LLMs) such as ChatGPT have gained considerable\ninterest across diverse research communities. Their notable ability for text\ncompletion and generation has inaugurated a novel paradigm for\nlanguage-interfaced problem solving. However, the potential and efficacy of\nthese models in bioinformatics remain incompletely explored. In this work, we\nstudy the performance LLMs on a wide spectrum of crucial bioinformatics tasks.\nThese tasks include the identification of potential coding regions, extraction\nof named entities for genes and proteins, detection of antimicrobial and\nanti-cancer peptides, molecular optimization, and resolution of educational\nbioinformatics problems. Our findings indicate that, given appropriate prompts,\nLLMs like GPT variants can successfully handle most of these tasks. In\naddition, we provide a thorough analysis of their limitations in the context of\ncomplicated bioinformatics tasks. In conclusion, we believe that this work can\nprovide new perspectives and motivate future research in the field of LLMs\napplications, AI for Science and bioinformatics.\nLink: http://arxiv.org/abs/2402.13714v1\n\n\nTitle: Privacy-Preserving Instructions for Aligning Large Language Models\nAuthors: Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu\nAbstract: Service providers of large language model (LLM) applications collect user\ninstructions in the wild and use them in further aligning LLMs with users'\nintentions. These instructions, which potentially contain sensitive\ninformation, are annotated by human workers in the process. This poses a new\nprivacy risk not addressed by the typical private optimization. To this end, we\npropose using synthetic instructions to replace real instructions in data\nannotation and model fine-tuning. Formal differential privacy is guaranteed by\ngenerating those synthetic instructions using privately fine-tuned generators.\nCrucial in achieving the desired utility is our novel filtering algorithm that\nmatches the distribution of the synthetic instructions to that of the real\nones. In both supervised fine-tuning and reinforcement learning from human\nfeedback, our extensive experiments demonstrate the high utility of the final\nset of synthetic instructions by showing comparable results to real\ninstructions. In supervised fine-tuning, models trained with private synthetic\ninstructions outperform leading open-source models such as Vicuna.\nLink: http://arxiv.org/abs/2402.13659v1\n\n\nTitle: Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in\n  Conversations with the Tabletop Robot Haru\nAuthors: Zining Wang, Paul Reisert, Eric Nichols, Randy Gomez\nAbstract: Social robots aim to establish long-term bonds with humans through engaging\nconversation. However, traditional conversational approaches, reliant on\nscripted interactions, often fall short in maintaining engaging conversations.\nThis paper addresses this limitation by integrating large language models\n(LLMs) into social robots to achieve more dynamic and expressive conversations.\nWe introduce a fully-automated conversation system that leverages LLMs to\ngenerate robot responses with expressive behaviors, congruent with the robot's\npersonality. We incorporate robot behavior with two modalities: 1) a\ntext-to-speech (TTS) engine capable of various delivery styles, and 2) a\nlibrary of physical actions for the robot. We develop a custom,\nstate-of-the-art emotion recognition model to dynamically select the robot's\ntone of voice and utilize emojis from LLM output as cues for generating robot\nactions. A demo of our system is available here. To illuminate design and\nimplementation issues, we conduct a pilot study where volunteers chat with a\nsocial robot using our proposed system, and we analyze their feedback,\nconducting a rigorous error analysis of chat transcripts. Feedback was\noverwhelmingly positive, with participants commenting on the robot's empathy,\nhelpfulness, naturalness, and entertainment. Most negative feedback was due to\nautomatic speech recognition (ASR) errors which had limited impact on\nconversations. However, we observed a small class of errors, such as the LLM\nrepeating itself or hallucinating fictitious information and human responses,\nthat have the potential to derail conversations, raising important issues for\nLLM application.\nLink: http://arxiv.org/abs/2402.11571v1\n\n\nTitle: Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots\n  in Ophthalmology and LLM-based evaluation using GPT-4\nAuthors: Ting Fang Tan, Kabilan Elangovan, Liyuan Jin, Yao Jie, Li Yong, Joshua Lim, Stanley Poh, Wei Yan Ng, Daniel Lim, Yuhe Ke, Nan Liu, Daniel Shu Wei Ting\nAbstract: Purpose: To assess the alignment of GPT-4-based evaluation to human clinician\nexperts, for the evaluation of responses to ophthalmology-related patient\nqueries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology\nquestions and paired answers were created by ophthalmologists to represent\ncommonly asked patient questions, divided into fine-tuning (368; 92%), and\ntesting (40; 8%). We find-tuned 5 different LLMs, including LLAMA2-7b,\nLLAMA2-7b-Chat, LLAMA2-13b, and LLAMA2-13b-Chat. For the testing dataset,\nadditional 8 glaucoma QnA pairs were included. 200 responses to the testing\ndataset were generated by 5 fine-tuned LLMs for evaluation. A customized\nclinical evaluation rubric was used to guide GPT-4 evaluation, grounded on\nclinical accuracy, relevance, patient safety, and ease of understanding. GPT-4\nevaluation was then compared against ranking by 5 clinicians for clinical\nalignment. Results: Among all fine-tuned LLMs, GPT-3.5 scored the highest\n(87.1%), followed by LLAMA2-13b (80.9%), LLAMA2-13b-chat (75.5%),\nLLAMA2-7b-Chat (70%) and LLAMA2-7b (68.8%) based on the GPT-4 evaluation. GPT-4\nevaluation demonstrated significant agreement with human clinician rankings,\nwith Spearman and Kendall Tau correlation coefficients of 0.90 and 0.80\nrespectively; while correlation based on Cohen Kappa was more modest at 0.50.\nNotably, qualitative analysis and the glaucoma sub-analysis revealed clinical\ninaccuracies in the LLM-generated responses, which were appropriately\nidentified by the GPT-4 evaluation. Conclusion: The notable clinical alignment\nof GPT-4 evaluation highlighted its potential to streamline the clinical\nevaluation of LLM chatbot responses to healthcare-related queries. By\ncomplementing the existing clinician-dependent manual grading, this efficient\nand automated evaluation could assist the validation of future developments in\nLLM applications for healthcare.\nLink: http://arxiv.org/abs/2402.10083v1\n\n\nTitle: Unmemorization in Large Language Models via Self-Distillation and\n  Deliberate Imagination\nAuthors: Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vulić\nAbstract: While displaying impressive generation capabilities across many tasks, Large\nLanguage Models (LLMs) still struggle with crucial issues of privacy violation\nand unwanted exposure of sensitive data. This raises an essential question: how\nshould we prevent such undesired behavior of LLMs while maintaining their\nstrong generation and natural language understanding (NLU) capabilities? In\nthis work, we introduce a novel approach termed deliberate imagination in the\ncontext of LLM unlearning. Instead of trying to forget memorized data, we\nemploy a self-distillation framework, guiding LLMs to deliberately imagine\nalternative scenarios. As demonstrated in a wide range of experiments, the\nproposed method not only effectively unlearns targeted text but also preserves\nthe LLMs' capabilities in open-ended generation tasks as well as in NLU tasks.\nOur results demonstrate the usefulness of this approach across different models\nand sizes, and also with parameter-efficient fine-tuning, offering a novel\npathway to addressing the challenges with private and sensitive data in LLM\napplications.\nLink: http://arxiv.org/abs/2402.10052v1\n\n\nTitle: Anchor-based Large Language Models\nAuthors: Jianhui Pang, Fanghua Ye, Derek F. Wong, Longyue Wang\nAbstract: Large language models (LLMs) predominantly employ decoder-only transformer\narchitectures, necessitating the retention of keys/values information for\nhistorical tokens to provide contextual information and avoid redundant\ncomputation. However, the substantial size and parameter volume of these LLMs\nrequire massive GPU memory. This memory demand increases with the length of the\ninput text, leading to an urgent need for more efficient methods of information\nstorage and processing. This study introduces Anchor-based LLMs (AnLLMs), which\nutilize an innovative anchor-based self-attention network (AnSAN) and also an\nanchor-based inference strategy. This approach enables LLMs to compress\nsequence information into an anchor token, reducing the keys/values cache and\nenhancing inference efficiency. Experiments on question-answering benchmarks\nreveal that AnLLMs maintain similar accuracy levels while achieving up to 99%\nkeys/values cache reduction and up to 3.5 times faster inference. Despite a\nminor compromise in accuracy, the substantial enhancements of AnLLMs employing\nthe AnSAN technique in resource utilization and computational efficiency\nunderscore their potential for practical LLM applications.\nLink: http://arxiv.org/abs/2402.07616v2\n\n\nTitle: T-RAG: Lessons from the LLM Trenches\nAuthors: Masoomali Fatehkia, Ji Kim Lucas, Sanjay Chawla\nAbstract: Large Language Models (LLM) have shown remarkable language capabilities\nfueling attempts to integrate them into applications across a wide range of\ndomains. An important application area is question answering over private\nenterprise documents where the main considerations are data security, which\nnecessitates applications that can be deployed on-prem, limited computational\nresources and the need for a robust application that correctly responds to\nqueries. Retrieval-Augmented Generation (RAG) has emerged as the most prominent\nframework for building LLM-based applications. While building a RAG is\nrelatively straightforward, making it robust and a reliable application\nrequires extensive customization and relatively deep knowledge of the\napplication domain. We share our experiences building and deploying an LLM\napplication for question answering over private organizational documents. Our\napplication combines the use of RAG with a finetuned open-source LLM.\nAdditionally, our system, which we call Tree-RAG (T-RAG), uses a tree structure\nto represent entity hierarchies within the organization. This is used to\ngenerate a textual description to augment the context when responding to user\nqueries pertaining to entities within the organization's hierarchy. Our\nevaluations show that this combination performs better than a simple RAG or\nfinetuning implementation. Finally, we share some lessons learned based on our\nexperiences building an LLM application for real-world use.\nLink: http://arxiv.org/abs/2402.07483v1\n\n\n", 'name': 'Retrieve_Action_2', 'role': 'user'}, {'content': "Based on the retrieved abstracts, here is a markdown table categorizing the papers by domain, along with their titles, authors, summaries, and links:\n\n| Domain | Title | Authors | Summary | Link |\n|--------|-------|---------|---------|------|\n| Security | PRSA: Prompt Reverse Stealing Attacks against Large Language Models | Yong Yang, Xuhong Zhang, Yi Jiang, Xi Chen, Haoyu Wang, Shouling Ji, Zonghui Wang | The paper explores the security risks associated with exposing input-output pairs of prompts used in LLMs and proposes a novel attack framework, PRSA, to reverse-steal prompts, posing a threat to intellectual property rights. | [Link](http://arxiv.org/abs/2402.19200v1) |\n| Ethics & Evaluation | Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models | Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy | This work challenges the constrained evaluation paradigm for values and opinions in LLMs and explores more realistic unconstrained evaluations, focusing on the Political Compass Test (PCT). | [Link](http://arxiv.org/abs/2402.16786v1) |\n| Urban Mobility | Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Chuan Xiao | Introduces an LLM agent framework for personal mobility generation, aligning LLMs with real-world urban mobility data, and offering a tool for urban mobility analysis. | [Link](http://arxiv.org/abs/2402.14744v1) |\n| Bioinformatics | An Evaluation of Large Language Models in Bioinformatics Research | Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun | Evaluates the performance of LLMs on bioinformatics tasks, highlighting their potential and limitations, and motivating future research in LLM applications in bioinformatics. | [Link](http://arxiv.org/abs/2402.13714v1) |\n| Privacy | Privacy-Preserving Instructions for Aligning Large Language Models | Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu | Proposes using synthetic instructions generated by privately fine-tuned generators to replace real instructions in data annotation and model fine-tuning, ensuring privacy while maintaining utility. | [Link](http://arxiv.org/abs/2402.13659v1) |\n| Social Robotics | Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru | Zining Wang, Paul Reisert, Eric Nichols, Randy Gomez | Integrates LLMs into social robots to generate dynamic and expressive conversations, using a text-to-speech engine and a library of physical actions for the robot. | [Link](http://arxiv.org/abs/2402.11571v1) |\n| Ophthalmology | Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4 | Ting Fang Tan, Kabilan Elangovan, Liyuan Jin, Yao Jie, Li Yong, Joshua Lim, Stanley Poh, Wei Yan Ng, Daniel Lim, Yuhe Ke, Nan Liu, Daniel Shu Wei Ting | Assesses the alignment of GPT-4-based evaluation to human clinician experts for evaluating responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. | [Link](http://arxiv.org/abs/2402.10083v1) |\n| Privacy & Data Security | Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination | Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vulić | Introduces a novel approach for LLM unlearning by guiding LLMs to imagine alternative scenarios, effectively unlearning targeted text while preserving generation and NLU capabilities. | [Link](http://arxiv.org/abs/2402.10052v1) |\n| Computational Efficiency | Anchor-based Large Language Models | Jianhui Pang, Fanghua Ye, Derek F. Wong, Longyue Wang | Proposes Anchor-based LLMs (AnLLMs) with an innovative anchor-based self-attention network (AnSAN) to reduce memory demand and enhance inference efficiency. | [Link](http://arxiv.org/abs/2402.07616v2) |\n| Enterprise Applications | T-RAG: Lessons from the LLM Trenches | Masoomali Fatehkia, Ji Kim Lucas, Sanjay Chawla | Shares experiences building and deploying an LLM application for question answering over private organizational documents, combining RAG with a finetuned LLM and a tree structure for entity hierarchies. | [Link](http://arxiv.org/abs/2402.07483v1) |\n\nThese papers cover a range of domains including security, ethics, urban mobility, bioinformatics, privacy, social robotics, ophthalmology, data security, computational efficiency, and enterprise applications, showcasing the diverse applications of large language models.", 'name': 'Research_Action_1', 'role': 'user'}], summary="Based on the retrieved abstracts, here is a markdown table categorizing the papers by domain, along with their titles, authors, summaries, and links:\n\n| Domain | Title | Authors | Summary | Link |\n|--------|-------|---------|---------|------|\n| Security | PRSA: Prompt Reverse Stealing Attacks against Large Language Models | Yong Yang, Xuhong Zhang, Yi Jiang, Xi Chen, Haoyu Wang, Shouling Ji, Zonghui Wang | The paper explores the security risks associated with exposing input-output pairs of prompts used in LLMs and proposes a novel attack framework, PRSA, to reverse-steal prompts, posing a threat to intellectual property rights. | [Link](http://arxiv.org/abs/2402.19200v1) |\n| Ethics & Evaluation | Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models | Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy | This work challenges the constrained evaluation paradigm for values and opinions in LLMs and explores more realistic unconstrained evaluations, focusing on the Political Compass Test (PCT). | [Link](http://arxiv.org/abs/2402.16786v1) |\n| Urban Mobility | Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Chuan Xiao | Introduces an LLM agent framework for personal mobility generation, aligning LLMs with real-world urban mobility data, and offering a tool for urban mobility analysis. | [Link](http://arxiv.org/abs/2402.14744v1) |\n| Bioinformatics | An Evaluation of Large Language Models in Bioinformatics Research | Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun | Evaluates the performance of LLMs on bioinformatics tasks, highlighting their potential and limitations, and motivating future research in LLM applications in bioinformatics. | [Link](http://arxiv.org/abs/2402.13714v1) |\n| Privacy | Privacy-Preserving Instructions for Aligning Large Language Models | Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu | Proposes using synthetic instructions generated by privately fine-tuned generators to replace real instructions in data annotation and model fine-tuning, ensuring privacy while maintaining utility. | [Link](http://arxiv.org/abs/2402.13659v1) |\n| Social Robotics | Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru | Zining Wang, Paul Reisert, Eric Nichols, Randy Gomez | Integrates LLMs into social robots to generate dynamic and expressive conversations, using a text-to-speech engine and a library of physical actions for the robot. | [Link](http://arxiv.org/abs/2402.11571v1) |\n| Ophthalmology | Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4 | Ting Fang Tan, Kabilan Elangovan, Liyuan Jin, Yao Jie, Li Yong, Joshua Lim, Stanley Poh, Wei Yan Ng, Daniel Lim, Yuhe Ke, Nan Liu, Daniel Shu Wei Ting | Assesses the alignment of GPT-4-based evaluation to human clinician experts for evaluating responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. | [Link](http://arxiv.org/abs/2402.10083v1) |\n| Privacy & Data Security | Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination | Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vulić | Introduces a novel approach for LLM unlearning by guiding LLMs to imagine alternative scenarios, effectively unlearning targeted text while preserving generation and NLU capabilities. | [Link](http://arxiv.org/abs/2402.10052v1) |\n| Computational Efficiency | Anchor-based Large Language Models | Jianhui Pang, Fanghua Ye, Derek F. Wong, Longyue Wang | Proposes Anchor-based LLMs (AnLLMs) with an innovative anchor-based self-attention network (AnSAN) to reduce memory demand and enhance inference efficiency. | [Link](http://arxiv.org/abs/2402.07616v2) |\n| Enterprise Applications | T-RAG: Lessons from the LLM Trenches | Masoomali Fatehkia, Ji Kim Lucas, Sanjay Chawla | Shares experiences building and deploying an LLM application for question answering over private organizational documents, combining RAG with a finetuned LLM and a tree structure for entity hierarchies. | [Link](http://arxiv.org/abs/2402.07483v1) |\n\nThese papers cover a range of domains including security, ethics, urban mobility, bioinformatics, privacy, social robotics, ophthalmology, data security, computational efficiency, and enterprise applications, showcasing the diverse applications of large language models.", cost=({'total_cost': 0}, {'total_cost': 0}), human_input=[])