For
an introduction to Reinforcement Learning, its basic terminologies,
concepts and types read Reinforcement Learning - Part 1 by following
this link:
http://blog.cerelabs.com/2017/04/reinforcement-learning-part-1.html.
For
an introduction to the concept of Q-learning algorithm, q-values and
mathematical representation of the same read Reinforcement Learning
Part- 2 by following this link:
CRAWLER
ROBOT
Robotics
and AI- A Perfect Blend
Robotics
and artificial intelligence has made a huge advancement in their
respective fields. But the blend of Robotics and Artificial
Intelligence have created some amazing stuff too. It doesn't seem far
in future when robots can learn themselves what humans can using
Artificial Intelligence. A peek into the future of robots learning
and an introduction to applications of reinforcement learning is what
this article is all about.
Introduction
A
variety of different problems can be solved using Reinforcement
Learning, as agents can learn without expert supervision. In this
project we created a crawler robot.The robot by using two arms to
crawl and pushes its body towards a wall. After reaching the wall it
stops and declares that it has reached the wall. In the process of
reaching the wall, the robot by itself learns the actions that its
arm have to take in order to move towards the wall. This is where
reinforcement learning comes into picture. Reinforcement learning
enables the robot to move its arms in different positions or states
and decide the perfect sequence of actions that maximize the rewards.
Terminology:
In
case of the crawler robot the states are the position of its arm and
hand. Current states is denoted by "s" and next state is
denoted by "s' ". Actions available are moving the arm up
or down and moving the hand left or right. Action is denoted by "a".
The rewards are given as follows: if the robot moves near the wall
i.e. the distance from the wall decreases by 4 cm or more it receives
a reward of +1. When the robot moves away from the wall by 4 cm or
more it receives a reward of -2. Also, if the robot stays in a
particular state without moving it receives a reward of -2. In case
the robot moves near the wall i.e. the distance from the wall
decreases by 4 cm or more it receives a reward of +1. When the robot
moves away from the wall by 4 cm or more it receives a reward of -2.
Also, if the robot stays in a particular state without moving it
receives a reward of -2.
Reinforcement learning for the Robot:
Here
we are going to use model based reinforcement learning to train our
crawler robot to move towards the wall. Using the model based
reinforcement learning type we already know the transitions, hence by
observation the robot knows what state it is in. This can consist on
the current distance from the wall, current position of its arm and
the current position of its hand. Next thing the robot knows is what
actions it can take. So, if the arm of the robot is in the up
position it can move it in the down position, similarly for the hand.
If it is in the left position we can move it to the right position.
Another thing it knows is its distance from the wall, if its less
than 10 cm then robot has to stop its motors.
Based
on the above mentioned data we known the reward it is getting for
staying in the same position that is -2. So after having the
transition the robot starts modeling the environment into a proper
model for itself. It now knows that only moving towards the wall is
going to get it positive rewards.Hence, it now knows its possible
future states and rewards.
After
observation of current and future states and rewards, using value
iteration it can find out out the value function for each state.
Using these values and policy iteration it can find the best policy
to get maximum rewards.
To
sum up, after observing its current positions of the hand, arm and
distance from wall. It calculates the values of each state it could
take and consequences it would face. And finally reaches to a
conclusion of a best policy. In the experiment performed on the
crawler robot, when it learned to move towards the wall, it figured
out when its arm was down and hand moved initially to the left and
then to the right it was getting rewards. Even though this wasn't the
perfect way to move towards the wall. The perfect sequence of actions
to be taken was move arm up, move hand left, move arm down, move hand
right. But as the actions it learned to performed gave it results it
started performing the same actions repeatedly until it reached the
wall.
Now,
let us see how the robot learned this procedure step by step.
Hardware of the crawler robot:
It
consists of a 125x105 mm chassis, with two 65 mm diameter dummy
wheels and a single castor wheel. Length of the hand is 80 mm
including the gripper. The gripper here is a Lego rubber wheel used
to help the robot push itself towards the wall while the gripper was
in contact with the ground. Length of the arm is 100 mm. The arm is
connected to the chassis by a 3.17 kg-cm futaba s3003 servo motor.
The arm was connected to the hand by another futaba s3003 motor. The
robot also has an ultrasonic sensor on its side facing the wall. The
motors and the ultrasonic sensor is connected to a Raspberry Pi 3
board. Connections of Raspberry Pi to the ultrasonic sensor and the
servos are given in the figures below.
In
the following figures the servo is connected to raspberry pi 3 and a
5v power source. You can also power the servo using the raspberry pi
5V and GND pins. The motor has three pins with common colours red,
brown and yellow. Connect the yellow to a gpio pin and red to 5V and
brown to GND on Raspberry Pi.
Q
Learning in the Crawler Robot:
The
Crawler Robot has 4 states, namely arm up-hand left, arm up - hand
right, arm down - hand left, arm down - hand right. It has 4 actions
namely, arm up and down, hand left and right.
Observing
Transitions:
Initially
all pins are set up and motors are started and positioned at arm up-
hand left , which is the zero state. Distance from the wall is
calculated. Then, learning is started.
Learning
Mode:
In
this mode the robot initially models the environment around it and
and starts planning. It first observes its current position of hand
and arm. Robot then creates a 4 x 4 (4 states and 4 actions ) q table
with random float values in the range of -1 and 1. Then it starts an
episode. There are total 10 episodes . In each episode it follows the
following steps:
Step
1: Getting the current position of hand and arm and deciding with
state the robot is in at present.
Step
2: Deciding action to be taken depending on the maximum value in
the Qtable.
Step
3: Calculating (predicting) reward for that action.
Step
4: Considering last action taken and comparing it with the
decided action, if its the same taking another action.
Step
5: After taking action calculating the actual reward received
for that action. The reward depends upon the difference
between current distance of robot from wall and previous distance
from wall.
Step
6: Again observing the current state and considering it as next
state or state prime(s') in reference to the previous
state.
Step
7: Choose if next action to be taken should be a random action or
a definite action. It depends upon the randomness 'ε'. It is
compared to a random value between 0 and 1. If ε <= random
value choose random action, else choose an action with maximum Q
value from the Q table.
Step
8: Decay the randomness with random action decay rate.
Step
9: Find Q(s,a) = (1 - α ) x Q(s,a) + α x (reward + γ x
Q(s',a') ). Update Q value of state 's' and action 'a' in the Q
table to the new found value.
Step
10: Update state to state prime and action to action prime.
Step
11: Save the new found Q table in a .csv files.
Step
12: Taken the chosen action.
Step
13: Check if distance from wall is less than 10 cm. If not repeat
all steps. If yes, stop the motors and declare "Reached
the wall".
Problems and Observations During Q learning of Crawler Robot:
- Servo motor control:
Every
servo motor has a different pulse at which it becomes zero. Initially
frequency was 50 Hz and my duty cycle was 1,0.5 etc. But the motor
never stopped at zero. When program was stopped using
KeyboardInterrupt it use to stop at position near zero but not
exactly at zero.When i increased the frequency to 500Hz the motor
stopped exactly at zero for dutycycle of (0). Hence, the
conclusion is the frequency affects the final position of the motor.
But with frequency 500Hz the motor did not rotate to its full
potential, hence using 50Hz frequency is optimal.
For
FUTABA S3003
- Motor AngleMilli SecsDuty cycle0 degree1 ms5Nuetral1.5 ms7.5180 degree2 ms10
- Same action problem
In
the initial phase step 4 was not included. Step 4 checks whether the
last action is taken is the same as current action. When step 4 was
not included the robot used to perform the same action again and
again. This would waste a lot of time in the learning phase. Hence,
the robot took long time before learning completely.
- Delay problem
Initially
we had delays in between action taken which wasn't necessary and it only slowed down the learning process.
- Ultrasonic Sensor Efficiency
The
ultrasonic sensor used in this robot is HC-SR04. It gives an error of
approximate 0.01 m.So, the difference between the old distance and
current distance was kept more than 0.04 m. This sensor has some
limitations. Once started the sensor needs settling time of 1 sec,
else it behaves abruptly. Also, the sensor has limited range. If the
wall is more than 1 m far from the sensor it behaves abruptly and
gives random readings.
The
difference between the current distance and old distance is 4 cm.This
means unless the robot has moved 4 cm due to the action it wouldn't
get positive or negative reward for action taken. Due to this, while
learning the robot would move approximately 2 cm away , then around 8
cm towards the wall. This affected the learning efficiency.
- Standing Reward
Initially
the robot wasn't provided with a standing reward. If it was not
moving then it would get no reward. But when negative reward was
provided for the not moving the learning process improved and time
was reduced.
References:
I always check these types of advisory post and I found your article. This is a great source to increase knowledge about manufacturing robotics integration. Thanks for sharing an article like this.
ReplyDelete