TradeBot: Bandit learning for hyper-parameters optimization of high frequency trading strategy. (April 2022)
- Record Type:
- Journal Article
- Title:
- TradeBot: Bandit learning for hyper-parameters optimization of high frequency trading strategy. (April 2022)
- Main Title:
- TradeBot: Bandit learning for hyper-parameters optimization of high frequency trading strategy
- Authors:
- Zhang, Weipeng
Wang, Lu
Xie, Liang
Feng, Ke
Liu, Xiang - Abstract:
- Highlights: We formulate the quantitative trading policy learning as a reinforcement learning problem and propose reward-agnostic UCB to learn the dynamically adjustable trading strategies' hyper-parameters with a powerful back-testing system. We leverage inverse reinforcement learning to learn a reward function for accurately estimating the profits of each trading order. We show promising performance on real-world high-frequency trading in China Commodity Future market. To our best knowledge, this is the first work deployed in online trading system via reinforcement learning. Abstract: Quantitative trading takes advantage of mathematical functions for automatically making stock or futures trading decisions. Specifically, various trading strategies that proposed by human-experts are associated with weight hyper-parameters to determine the probability of selecting a specific strategy according to market conditions. Prior work manually adjusting the weight hyper-parameters is error-prone, because the essential advantage of quantitative trading, i.e., automation, is lost. In this paper, we propose a dynamic parameter tuning algorithm, i.e., TradeBot, based on bandit learning for quantitative trading. We consider sequentially selecting hyper-parameters of rules for trading as a bandit game, where a set of hyper-parameters of trading rule is considered as an action. A novel reward-agnostic Upper Confidence Bound bandit method is proposed to solve the automatically trading problemHighlights: We formulate the quantitative trading policy learning as a reinforcement learning problem and propose reward-agnostic UCB to learn the dynamically adjustable trading strategies' hyper-parameters with a powerful back-testing system. We leverage inverse reinforcement learning to learn a reward function for accurately estimating the profits of each trading order. We show promising performance on real-world high-frequency trading in China Commodity Future market. To our best knowledge, this is the first work deployed in online trading system via reinforcement learning. Abstract: Quantitative trading takes advantage of mathematical functions for automatically making stock or futures trading decisions. Specifically, various trading strategies that proposed by human-experts are associated with weight hyper-parameters to determine the probability of selecting a specific strategy according to market conditions. Prior work manually adjusting the weight hyper-parameters is error-prone, because the essential advantage of quantitative trading, i.e., automation, is lost. In this paper, we propose a dynamic parameter tuning algorithm, i.e., TradeBot, based on bandit learning for quantitative trading. We consider sequentially selecting hyper-parameters of rules for trading as a bandit game, where a set of hyper-parameters of trading rule is considered as an action. A novel reward-agnostic Upper Confidence Bound bandit method is proposed to solve the automatically trading problem with a reward function estimated by inverse reinforcement learning. Experimental results on China Commodity Futures Market Data show state-of-the-art performance. To our best knowledge, this is one of the first work deployed in the online trading system via reinforcement learning, in published literature. … (more)
- Is Part Of:
- Pattern recognition. Volume 124(2022)
- Journal:
- Pattern recognition
- Issue:
- Volume 124(2022)
- Issue Display:
- Volume 124, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 124
- Issue:
- 2022
- Issue Sort Value:
- 2022-0124-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-04
- Subjects:
- High-Frequency trading -- Hyper-parameter optimization -- Multi-armed bandit learning -- Inverse reinforcement learning
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2021.108490 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22256.xml