Efficient thrust generation in robotic fish caudal fins using policy search

Thrust generation is a crucial aspect of fish locomotion that depends on a variety of morphological and kinematic parameters. In this work, the kinematics of caudal fin motion of a robotic fish are optimised experimentally. The robotic fish actuates its caudal fin with flapping and rotation motion, and also measures the fin hydrodynamic force and torque. Total nine designs of the caudal fins are investigated, with three different shapes (or inclination angles) and three stiffness. The optimisation is based on a policy search (PS) algorithm, which is used to maximise the thrust-generation efficiency of the caudal fins. The authors first parametrise fin spanwise-rotation as a sinusoidal function using rotation amplitude and phase delay and test whether it is beneficial to thrust-generation efficiency. The result shows that the rotation does not contribute to the efficiency, as the efficiency is maximised at zero amplitude. Next, the authors optimise flapping amplitude and trajectory profile without fin rotation. Results show that smaller flapping amplitude results in higher efficiency and linear flapping trajectories are preferred over sinusoidal ones. Fins that have the highest flexibility are more efficient in thrust generation although they generate less thrust, while an inclination angle of 30° yields the most efficient fin shape.


Introduction
Fishes are efficient swimmers that locomote through complex fluid-structure interaction. Continued interests in fish locomotion aim to understand how the fish body and fins [1], with distinct morphological features and the structures [2][3][4][5], generate hydrodynamic propulsive forces through their undulatory or oscillatory motion [6][7][8][9], and the underlying hydrodynamics [10][11][12][13][14]. These research efforts use either experimental or computational fluid dynamics to measure or calculate the fluid field generated by the fish's body-fin motion and study the underlying vortex formation and dynamics. Inspired by fish locomotion, a variety of robotic fishes for novel underwater propulsion have been proposed, including those with single-joint design [15,16], multi-joint design [17,18] and smart-materialbased design [19][20][21].
Fish swimming patterns can be classified into body and/or caudal fin (BCF) locomotion and median and/or paired fin (MPF) locomotion [22]. For BCF locomotion, caudal fins, and bodies of fishes are the main thrust generator. The efficiency of thrust generation by caudal fins has been investigated extensively with different fin shapes and stiffness, using either experimental or computational methods. Parasar et al. [15] designed and tested flapping caudal fins with different configurations in a tow tank, and found that both shape and stiffness had significant effects on the generation of thrust. Feilich and Lauder [16] tested caudal fins with different shapes varying from tuna-like fork shapes to square-like shapes and different stiffness, they used particle image velocimetry (PIV) method to capture the fluid field in the wake of the caudal fin, which was used for estimating the propulsive forces. They concluded that there was no 'optimal' choice of shapes and stiffness, due to the complex interaction between shapes and stiffness. Arun et al. [23] used computational method to study how various shape and stiffness of caudal fins can affect the efficiency of thrust generation. They discovered the most optimal fin shape was between fork-like and rectangular. In addition, they found leading edge vortices (LEV) played an important role in achieving high propulsive performance.
Although previous studies successfully investigated the thrustgeneration performance in caudal fins, the investigations were largely limited to certain prescribed fin kinematic patterns (e.g. fixed flapping amplitudes or frequencies) rather than searching the entire kinematic parameter space. Due to the continuous motion of the fins, even one or two degrees of freedom motion will have a large kinematic parameter space to search. In this regard, reinforcement learning (RL) methods, in particular the model-free policy search (PS) algorithms can be a potent choice for real-time experimental robotic applications to search the parameter space [24]. The PS algorithm uses parameterised policy directly acting in the parameter spaces and with the PS algorithm, it is possible to scale down the search spaces of policies and allows the learning agent to enter high-dimension state action spaces. The PS methods try to update the policy such that trajectories with higher rewards (i.e. the efficiency of thrust generation) are more likely to appear. In addition, the PS methods could avoid significant alternations in the policy (i.e. define the behaviour of the learning agent at given time) or prevent the agent from entering undesirable state spaces [24].
The objective of this work is to optimise the caudal-fin motion patterns of a robotic fish for efficient thrust generation. The robotic fish has two degrees of freedom: flapping and rotation. A PS algorithm, i.e. parameter exploring policy gradient (PEPG), is applied on the robotic fish model operating in a mineral-oil tank. The thrust generated by the caudal fin and the actuation torque are measured by a six-component force/torque sensor, while the robot is fixed rigidly in the tank. This work is divided into two stages. For the first stage (fin-rotation experiments), we test whether fin spanwise-rotation motion can improve the thrust-generation efficiency by optimising the rotation amplitude and phase delay (relative to flapping) for nine designs of caudal fins with three different shapes and three stiffness. The Reynolds number (Re) (defined below) is fixed at 500 and the flapping amplitude is fixed at 30°. In the second stage (fin-flapping experiments), we parameterise and optimise the fin flapping trajectory using the flapping amplitude and trajectory profile in the absence of fin rotation. Since the flapping amplitude varies during learning process, Re changes in the experiments.

Robotic fish model and experimental setup
The robotic fish model includes two servo motors, a pair of connection cranks, a servo case, a servo base, a six-axis force/ torque sensor, and the fish caudal fin as shown in Fig. 1. The robotic fish model has two degrees of freedom, i.e. flapping and rotation motion, with the servo #1 controls the flapping motion through the coupled crank and servo #2 controls the rotational motion directly. The servo motor (XM450-W350-R, Robotics, Lake Forest, USA) has an integrated DC motor, a microcontroller, a magnetic encoder, and a gear system and use a PID controller to control the velocity and position with a resolution of 0.088°. The six-axis sensor (Nano 17-IP65, ATI) is attached on the output shaft of servo #2, measuring the total force/torque that drive the caudal fin in x, y, and z directions (with resolutions of 1/320 N and 1/64 N mm). The entire robotic fish model is submerged in a 0.8 m × 0.8 m × 0.8 m tank filled with white mineral oil with density ρ = 0.83 g/cm 3 and viscosity ν = 6 × 10 −6 m 2 /s at 22°C. The room temperature is controlled constantly 22°C to not affect the viscosity and the effect of walls is tested to be negligible [25,26]. A sketch for experimental set up is shown as Fig. 2a.
Motor control and data acquisition are performed by a realtime-target machine (Performance Real-Time Target Machine, Speedgoat, Switzerland) interfacing with MATLAB and SimuLink real time. The PS algorithm is first programmed in SimuLink and then download to and ran on the target machine. The target computer interfaces with servo motors via IO-323 board and RS-485 communication. The IO-323 board also interfaces with the Nano 17 at a sample rate of 250 Hz. The entire communication process is shown in Fig. 1b. The instantaneous positions of servo motors can be recorded by their internal encoders and compared with the reference signals from the target machine, thereby ensuring negligible trajectory tracking error (< 0.5°) with welltuned PID controller.

Fish caudal fin design and kinematic pattern
Three caudal fin shapes with the same total span length x 2 but various peduncle length x 1 and inclined angle θ are tested in the experiments (Fig. 2b). The span length is selected by taking into account the measurement range and the resolution of the Nano 17 force/torque sensor. Caudal fin flexural stiffness is another design factor that affects the thrust generation [15], which is varied using two different materials: Polylactic Acid (PLA) and Polytetrafluoroethylene (PTFE), with various thickness ( Table 1). The density ratio between the wings and the oil are 1.57 (PLA) and 2.59 (PTFE). To quantify the flexural stiffness of the caudal fins, a non-dimensional stiffness, K, is used [27], which is defined as: where E represents Young's modulus, I is the second moment of area of the cross-section of the caudal fin, ρ is the density of fin material, U R 2 is the velocity at wing R 2 position (5), and finally L is the total span length of the caudal fin. In general, at the same flapping frequency, smaller the K, larger the deformation of the caudal fins. According to the literature [28], the caudal fin can be treated as rigid when K is larger than 6.25 at f = 0.25 Hz. For all three caudal fin shapes, three types of flexibility, ranging from rigid, median flexible to flexible, are designed, yielding total nine types of caudal fins tested in the experiments (Tables 1 and 2). The Re, which represents the ratio of inertial forces to viscous forces of the fluids, is defined as: where c¯ stands for the average chord-wise length, U R 2 is the reference velocity at R 2 position (the radius of second moment of area Eqn. 5) and ν is the kinematic viscosity of mineral oil. c¯ is calculated as the total fish tail area divided by the span length x 2 (Fig. 2). U R 2 is calculated as: where ϕ m is the flapping amplitude. By substituting Eqn. (3) back into Eqn. (2), it yields: To maintain a constant Re, the average chord length c¯ and the radius of second moment of area R 2 (Eqn. 5) are adjusted by tuning the peduncle length and height of caudal fins, such that the multiplication of barc and R 2 remains unchanged (with other parameters kept as constants). where S is the total caudal fin area, R is the fin length and c(r) is the chord length at a given radial position r.
In the first stage of this work (fin-rotation experiments), the caudal-fin motion is described by two sinusoids functions for the two servo motors: where ϕ(t) defines the flapping motion and γ(t) defines the rotation motion; ϕ m is the flapping amplitude and it is fixed to 30° in all experiments at the first stage; f is fixed at 0.25 Hz for both flapping and rotating motion; t is the time instant denoted as t = T s K where T s = 4 ms is the sampling period and K is the sampling number; similarly, γ m is the rotating amplitude and ψ 0 is the phase angle between flapping and rotating motion; γ 0 is the offset of the rotation phase, which is set to 90° for symmetric rotation. Both γ m and ψ 0 are optimised through PS. In the first stage, Re is maintained at 500, while the nine caudal fins are tested to find the best combination or feature that yields the highest efficiency of thrust generation.   In the second stage (fin-flapping experiments), a flapping trajectory parametrised for varying from sinusoidal to triangular flapping pattern is used [29]: where ϕ m is the flapping amplitude, 0 < K < 1 is a shape-tuning parameter and the flapping frequency f is fixed to 0.25 Hz. When K approaches zero, the trajectory is more sinusoidal-like (Fig. 3), and when it approaches 1, the trajectory becomes more triangular-like, i.e. having a linear profile. Both K and ϕ m are optimised through PS. K is bounded between 0 and 0.8, as the upper limit is set to avoid poor trajectory tracking due to high acceleration of a triangular trajectory; and the ϕ m is bounded between 10° and 30°, which corresponds to the range observed in real fishes [30]. The purpose of this stage is to find the optimal flapping amplitude and trajectory profile that could lead to the highest thrust reward (thrust/power) based on different caudal fish fins. Since the force and torque created by the fins could be significantly lower due to a small ϕ m , in the second stage, we scaled up the nine caudal fins from the stage one to increase the force magnitude and, therefore, ensuring the resolution of force/torque measurements. Note that in this stage, the Re is not kept constant as it varies during the learning process when the flapping amplitude changes with the policy updates (frequency is fixed).

Policy gradient algorithm
The learning task at hand is a good fit to episode-based PS methods [24] due to the clearly defined episodes, i.e. fin flapping cycle(s) with a time length T. Therefore, in this work, an episode-based learning algorithm, named parameter exploring policy gradient (PEPG) is used [31]. This algorithm has been successfully applied to several robot-learning problems including an analogous periodic locomotion in fluids problem [32]. In that work, hovering efficiency of a flapping wing at low Re is optimised.
Here, the policy π θ represents the fin motion trajectories described above, which are parameterised by a policy vector θ, i.e. the kinematic parameters in the fin motion trajectories that are need to be optimised. The policy gradient methods commonly maximise the expected return J π by using the policy gradient ascent, with a policy update: where γ is the learning rate, ∇ π J π is the policy gradient and θ is the policy parameter vector [32]. In episode-based algorithms, policy gradient is estimated using the total cumulative reward for several rollouts (trials) that share the same policy. Then, at the end of each episode, the policy update is performed with this approximate policy gradient. Therefore, the instantaneous rewards are not required for learning.
In classical likelihood-ratio policy gradient learning algorithms, the policy parameter vector θ is directly learned and the exploration is performed in the action space. However, this usually leads to a high variance in the sample over histories resulting in a noisy gradient estimate [31]. The PEPG algorithm, on the other hand, learns distributions of policy parameters rather than themselves, shifting the exploration from action space to parameter space, reducing the variance. In turn, the reliability of the algorithm increases significantly [32], while quality and speed of convergence is improved. Particularly, in PEPG, μ and σ (Gaussian distribution assumed) are introduced, which represent mean and standard deviation of policy parameter vector θ, and θ ∼ N(μ, Iσ 2 ), and these distribution parameters are updated using the policy gradients.
In order to help the reinforcement learning algorithm to converge faster to true policy gradient with lower number of rollouts, a simple moving average baseline is also employed. Although moving average is not the optimal one, experiments showed this baseline results in acceptable convergence rates. The details and the analysis of the algorithm are provided in [31,33].

Problem formulation with PEPG
We first test whether fin span-wise-rotation is beneficial for the efficiency of thrust generation by optimising the rotation amplitude and phase delay (relative to flapping) for the nine designs of caudal fins (fin-rotation experiments). The fin kinematic parameters (or the policy parameters) to be optimised include γ m and ψ 0 while other parameters are maintained constant during the learning, resulting in, total of four parameters [μ γ m , σ γ m , μ ψ 0 , σ ψ 0 ] T . In the second stage of this work (fin-flapping experiments), we optimise the flapping trajectory assuming zero fin rotation, where K and ϕ m are the two kinematic parameters to be optimised, accordingly [μ K , σ K , μ ϕ m , σ ϕ m ] T is the parameter vector.
In this work, the cumulative reward function that will be maximised through the PS is set as the efficiency of the thrust generation, which is defined as the ratio of the cycled-averaged thrust generated by the wing over the cycle-averaged hydrodynamic power consumption: The thrust is obtained directly from the force measurement and the hydrodynamic power is calculated based on the prescribed angular velocity of the wing and the measured hydrodynamic torque acting on the wing. Note that the optimisation of fin kinematics in this work only considers the dimensional case, in other words, we only aim at optimising the defined efficiency for our specific design of robotic fish, without investigating on the dimensionless space. In this work, each rollout performed four flapping cycles and the cycle averaging is done between start of the third and fourth cycles in order to achieve steady state. In addition, a linear phase low-pass FIR filter is designed to remove the high-frequency noise from the measured forces and torque in real time. The measurements are obtained starting at the third stroke when the wake of the caudal fin is fully developed. The instantaneous power can be calculated as: where T(t) is the instantaneous torque measured by the senor and ω is the instantaneous angular velocity.

Optimisation of caudal-fin rotation
Here, we first test whether fin span-wise-rotation is beneficial for the efficiency of thrust generation via optimising the rotation amplitude and phase delay (relative to flapping) for the nine designs of caudal fins, corresponding to nine sets of experiments. Note the friction of caudal fins are not considered in following experiments since all caudal fins have relatively smooth surface and the effect of friction is negligible. For all experiments, the flapping amplitude is fixed at 30° (half stroke) and both flapping and rotation frequency are 0.25 Hz. All nine experiments start with initial conditions of rotation angle γ m = 30° and phase delay ψ 0 = 45°. The result (Fig. 4) shows that the rotation motion does not contribute to efficiency of thrust generation, as efficiency is maximised at zero-amplitude rotation for all caudal fins tested. Different initial conditions are tested and eventually rotation angle converges to zero in all cases.
Specifically, Fig. 4 shows that the rotation angles in all nine cases gradually converge to the vicinity of 0°, which yields the highest efficiency. In all wing configurations, the rotation angle increases first and then starts to decrease. The reinforcement learning process consists of 40 episodes, which means the learning parameters have been updated 40 times. Each episode includes five rollouts. Note that the rotation angle is oscillating around 0° due to the non-zero variance that is needed for exploration. Fig. 5 shows the maximal cumulative reward (i.e. the defined efficiency of thrust generation) gathered in each experiment and it can be seen that for the same fin shape, the efficiency increases as the fin becomes more flexible. Interestingly, the fins with 30°i nclined angle have the highest efficiency compared with other shapes for all types of flexibility. The caudal fin with 30° inclined angle and 1/32-inch thickness PTFE (E 3 30 ) has the highest efficiency among all experimental cases. At the same flexibility, caudal fins of 30° inclined angle generate the highest thrust but only with a small increase of power. Although stiffer caudal fins are found to be less efficient, they generate higher propulsive forces compared with more flexible fins, however they consume proportionally higher power.

Optimisation of caudal-fin flapping
From the above results, it can be concluded that the rotation angle cannot contribute to the efficiency of thrust generation. Therefore, in the second stage of this work, a PS is applied to only optimise the flapping motion in terms of its amplitude and velocity profile, with the rotation angle set to 0°. The result shows that smaller flapping amplitude results in higher efficiency and linear flapping trajectories are preferred over sinusoidal trajectories.
Specifically, all nine sets of experiments start with the same initial conditions (i.e. 20° initial flapping amplitude and K = 0.4). The flapping frequency is fixed at 0.25 Hz, the flapping amplitude is bounded between 10° and 30°, which encompasses the range observed in ostraciiform fishes [30], and K is bounded between 0 and 0.8. If the policy explores some values beyond the range, the robot will still perform the boundary conditions. From the convergence results, it can be seen that smaller flapping amplitude and larger K values are preferred in all cases, which indicates that  higher efficiency of thrust generation can be achieved by smaller flapping amplitudes and more linear velocity profiles regardless of the shapes and the stiffness of caudal fins. However, it should also be pointed out that the generated thrust is relatively small while reaching higher efficiency, meaning that the efficiency increases with decreasing thrust. Fig. 6 shows the maximal cumulative reward gathered in each experiment and it can be seen that for the same fin shape, the efficiency increases as the fin becomes more flexible, same as those observed in the first stage. Moreover, caudal fins with 30°i nclined angle still return the highest rewards compared with other shapes and consequently the most flexible fin with 30° inclined angle (E 3 30 ) has the highest efficiency among all nine caudal fins tested. The trends of thrust and power (Fig. 7) are also quite similar with those observed in the first stage. Within the same shape (e.g. same inclined angle), the stiffer fins generate higher propulsive forces while consuming proportionally higher power leading to low efficiency.

Conclusions and future work
This work aims to optimise the caudal-fin motion patterns in two stages using the PEPG algorithm. The first stage reveals that the fin rotation does not contribute to the efficiency of the thrust generation, and accordingly the second stage only optimises the flapping trajectories for the efficiency. It also investigates the effects of fin flexibility by testing nine different caudal fins with different materials and thickness. The results show that more flexible fins generate thrust with higher efficiency (i.e. consuming less power), however they generate less thrust compared with the rigid fins. Moreover, fins with 30° inclined angle has the highest efficiency compared with other shapes regardless of the fin flexibility. It is also shown that smaller flapping amplitude results in higher efficiency of thrust generation while a more triangular flapping pattern (i.e. linear velocity profile) yields high efficiency than more sinusoidal trajectories.
In the future work, the efficiency of swimming at different forward speeds will be optimised for the robotic fish, while the effects of fish body shape will also be investigated. In addition, with the optimised fin trajectory, shape and flexibility, the underlying fluid mechanics will be studied experimentally, for example using the PTV techniques [34].