What the hell is reinforcement learning and how does it work? - The Next Web - Conscious Evolution TV Conscious Evolution TV

Home » Alphago » What the hell is reinforcement learning and how does it work? – The Next Web

What the hell is reinforcement learning and how does it work? – The Next Web

Posted: November 2, 2020 at 1:56 am

Reinforcement learning is a subset of machine learning. It enables an agent to learn through the consequences of actions in a specific environment. It can be used to teach a robot new tricks, for example.

Reinforcement learning is a behavioral learning model where the algorithm provides data analysis feedback, directing the user to the best result.

It differs from other forms of supervised learning because the sample data set does not train the machine. Instead, it learns by trial and error. Therefore, a series of right decisions would strengthen the method as it better solves the problem.

Reinforced learning is similar to what we humans have when we are children. We all went through the learning reinforcement when you started crawling and tried to get up, you fell over and over, but your parents were there to lift you and teach you.

It is teaching based on experience, in which the machine must deal with what went wrong before and look for the right approach.

Although we dont describe the reward policy that is, the game rules we dont give the model any tips or advice on how to solve the game. It is up to the model to figure out how to execute the task to optimize the reward, beginning with random testing and sophisticated tactics.

By exploiting research power and multiple attempts, reinforcement learning is the most successful way to indicate computer imagination. Unlike humans, artificial intelligence will gain knowledge from thousands of side games. At the same time, a reinforcement learning algorithm runs on robust computer infrastructure.

An example of reinforced learning is the recommendation on Youtube, for example. After watching a video, the platform will show you similar titles that you believe you will like. However, suppose you start watching the recommendation and do not finish it. In that case, the machine understands that the recommendation would not be a good one and will try another approach next time.

[Read: What audience intelligence data tells us about the 2020 US presidential election]

Reinforcement learnings key challenge is to plan the simulation environment, which relies heavily on the task to be performed. When trained in Chess, Go, or Atari games, the simulation environment preparation is relatively easy. Building a model capable of driving an autonomous car is key to creating a realistic prototype before letting the car ride the street. The model must decide how to break or prevent a collision in a safe environment. Transferring the model from the training setting to the real world becomes problematic.

Scaling and modifying the agents neural network is another problem. There is no way to connect with the network except by incentives and penalties. This may lead to disastrous forgetfulness, where gaining new information causes some of the old knowledge to be removed from the network. In other words, we must keep learning in the agents memory.

Another difficulty is reaching a great location that is, the agent executes the mission as it is, but not in the ideal or required manner. A hopper jumping like a kangaroo instead of doing what is expected of him is a perfect example. Finally, some agents can maximize the prize without completing their mission.

Games

RL is so well known today because it is the conventional algorithm used to solve different games and sometimes achieve superhuman performance.

The most famous must be AlphaGo and AlphaGo Zero. AlphaGo, trained with countless human games, has achieved superhuman performance using the Monte Carlo tree value research and value network (MCTS) in its policy network. However, the researchers tried a purer approach to RL training it from scratch. The researchers left the new agent, AlphaGo Zero, to play alone and finally defeat AlphaGo 1000.

Personalized recommendations

The work of news recommendations has always faced several challenges, including the dynamics of rapidly changing news, users who tire easily, and the Click Rate that cannot reflect the user retention rate. Guanjie et al. applied RL to the news recommendation system in a document entitled DRN: A Deep Reinforcement Learning Framework for News Recommendation to tackle problems.

In practice, they built four categories of resources, namely: A) user resources, B) context resources such as environment state resources, C) user news resources, and D) news resources such as action resources. The four resources were inserted into the Deep Q-Network (DQN) to calculate the Q value. A news list was chosen to recommend based on the Q value, and the users click on the news was part of the reward the RL agent received.

The authors also employed other techniques to solve other challenging problems, including memory repetition, survival models, Dueling Bandit Gradient Descent, and so on.

Resource management in computer clusters

Designing algorithms to allocate limited resources to different tasks is challenging and requires human-generated heuristics.

The article Resource management with deep reinforcement learning explains how to use RL to automatically learn how to allocate and schedule computer resources for jobs on hold to minimize the average job (task) slowdown.

The state-space was formulated as the current resource allocation and the resource profile of jobs. For the action space, they used a trick to allow the agent to choose more than one action at each stage of time. The reward was the sum of (-1 / job duration) across all jobs in the system. Then they combined the REINFORCE algorithm and the baseline value to calculate the policy gradients and find the best policy parameters that provide the probability distribution of the actions to minimize the objective.

Traffic light control

In the article Multi-agent system based on reinforcement learning to control network traffic signals, the researchers tried to design a traffic light controller to solve the congestion problem. Tested only in a simulated environment, their methods showed results superior to traditional methods and shed light on multi-agent RLs possible uses in traffic systems design.

Five agents were placed in the five intersections traffic network, with an RL agent at the central intersection to control traffic signaling. The state was defined as an eight-dimensional vector, with each element representing the relative traffic flow of each lane. Eight options were available to the agent, each representing a combination of phases, and the reward function was defined as a reduction in delay compared to the previous step. The authors used DQN to learn the Q value of {state, action} pairs.

Robotics

There is an incredible job in the application of RL in robotics. We recommend reading this paper with the result of RL research in robotics. In this other work, the researchers trained a robot to learn policies to map raw video images to the robots actions. The RGB images were fed into a CNN, and the outputs were the engine torques. The RL component was policy research guided to generate training data from its state distribution.

Web systems configuration

There are more than 100 configurable parameters in a Web System, and the process of adjusting the parameters requires a qualified operator and several tracking and error tests.

The article A learning approach by reinforcing the self-configuration of the online Web system showed the first attempt in the domain on how to autonomously reconfigure parameters in multi-layered web systems in dynamic VM-based environments.

The reconfiguration process can be formulated as a finite MDP. The state-space was the system configuration; the action space was {increase, decrease, maintain} for each parameter. The reward was defined as the difference between the intended response time and the measured response time. The authors used the Q-learning algorithm to perform the task.

Although the authors used some other technique, such as policy initialization, to remedy the large state space and the computational complexity of the problem, instead of the potential combinations of RL and neural network, it is believed that the pioneering work prepared the way for future research in this area

Chemistry

RL can also be applied to optimize chemical reactions. Researchers have shown that their model has outdone a state-of-the-art algorithm and generalized to different underlying mechanisms in the article Optimizing chemical reactions with deep reinforcement learning.

Combined with LSTM to model the policy function, agent RL optimized the chemical reaction with the Markov decision process (MDP) characterized by {S, A, P, R}, where S was the set of experimental conditions ( such as temperature, pH, etc.), A was the set of all possible actions that can change the experimental conditions, P was the probability of transition from the current condition of the experiment to the next condition and R was the reward that is a function of the state.

The application is excellent for demonstrating how RL can reduce time and trial and error work in a relatively stable environment.

Auctions and advertising

Researchers at Alibaba Group published the article Real-time auctions with multi-agent reinforcement learning in display advertising. They stated that their cluster-based distributed multi-agent solution (DCMAB) has achieved promising results and, therefore, plans to test the Taobao platforms life.

Generally speaking, the Taobao ad platform is a place for marketers to bid to show ads to customers. This can be a problem for many agents because traders bid against each other, and their actions are interrelated. In the article, merchants and customers were grouped into different groups to reduce computational complexity. The agents state-space indicated the agents cost-revenue status, the action space was the (continuous) bid, and the reward was the customer clusters revenue.

Deep learning

More and more attempts to combine RL and other deep learning architectures can be seen recently and have shown impressive results.

One of RLs most influential jobs is Deepminds pioneering work to combine CNN with RL. In doing so, the agent can see the environment through high-dimensional sensors and then learn to interact with it.

CNN with RL are other combinations used by people to try new ideas. RNN is a type of neural network that has memories. When combined with RL, RNN offers agents the ability to memorize things. For example, they combined LSTM with RL to create a deep recurring Q network (DRQN) for playing Atari 2600 games. They also usedLSTM with RL to solve problems in optimizing chemical reactions.

Deepmind showed how to use generative models and RL to generate programs. In the model, the adversely trained agent used the signal as a reward for improving actions, rather than propagating gradients to the entry space as in GAN training. Incredible, isnt it?

Reinforcement is done with rewards according to the decisions made; it is possible to learn continuously from interactions with the environment at all times. With each correct action, we will have positive rewards and penalties for incorrect decisions. In the industry, this type of learning can help optimize processes, simulations, monitoring, maintenance, and the control of autonomous systems.

Some criteria can be used in deciding where to use reinforcement learning:

In addition to industry, reinforcement learning is used in various fields such as education, health, finance, image, and text recognition.

This article was written by Jair Ribeiro and was originally published on Towards Data Science. You can read it here.

Published October 27, 2020 10:49 UTC

Originally posted here:

What the hell is reinforcement learning and how does it work? - The Next Web

Facebooks Hanabi-playing AI achieves state-of-the-art results - VentureBeat [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
Biggest scientific discoveries of the 2010s decade: photos - Business Insider [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
DeepMind co-founder moves to Google as the AI lab positions itself for the future - The Verge [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
AlphaGo - Wikipedia [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
DeepMind Vs Google: The Inner Feud Between Two Tech Behemoths - Analytics India Magazine [Last Updated On: December 18th, 2019] [Originally Added On: December 18th, 2019]
AI is dangerous, but not for the reasons you think. - OUPblog [Last Updated On: December 18th, 2019] [Originally Added On: December 18th, 2019]
The Perils and Promise of Artificial Conscientiousness - WIRED [Last Updated On: December 18th, 2019] [Originally Added On: December 18th, 2019]
AI has bested chess and Go, but it struggles to find a diamond in Minecraft - The Verge [Last Updated On: December 18th, 2019] [Originally Added On: December 18th, 2019]
What is AlphaGo? - Definition from WhatIs.com [Last Updated On: December 22nd, 2019] [Originally Added On: December 22nd, 2019]
What are neural-symbolic AI methods and why will they dominate 2020? - The Next Web [Last Updated On: January 18th, 2020] [Originally Added On: January 18th, 2020]
AlphaZero beat humans at Chess and StarCraft, now it's working with quantum computers - The Next Web [Last Updated On: January 18th, 2020] [Originally Added On: January 18th, 2020]
Why asking an AI to explain itself can make things worse - MIT Technology Review [Last Updated On: January 29th, 2020] [Originally Added On: January 29th, 2020]
Why The Race For AI Dominance Is More Global Than You Think - Forbes [Last Updated On: February 10th, 2020] [Originally Added On: February 10th, 2020]
AI on steroids: Much bigger neural nets to come with new hardware, say Bengio, Hinton, and LeCun - ZDNet [Last Updated On: February 10th, 2020] [Originally Added On: February 10th, 2020]
I think, therefore I am said the machine to the stunned humans - Innovation Excellence [Last Updated On: February 10th, 2020] [Originally Added On: February 10th, 2020]
From Deception to Attrition: AI and the Changing Face of Warfare - War on the Rocks [Last Updated On: February 20th, 2020] [Originally Added On: February 20th, 2020]
Levels And Limits Of AI - Forbes [Last Updated On: February 20th, 2020] [Originally Added On: February 20th, 2020]
How to overcome the limitations of AI - TechTarget [Last Updated On: February 20th, 2020] [Originally Added On: February 20th, 2020]
The top 5 technologies that will change health care over the next decade - MarketWatch [Last Updated On: February 25th, 2020] [Originally Added On: February 25th, 2020]
Chess grandmaster Gary Kasparov predicts AI will disrupt 96 percent of all jobs - The Next Web [Last Updated On: February 25th, 2020] [Originally Added On: February 25th, 2020]
Enterprise AI Books to Read This Spring - DevOps.com [Last Updated On: March 14th, 2020] [Originally Added On: March 14th, 2020]
The New ABCs: Artificial Intelligence, Blockchain And How Each Complements The Other - JD Supra [Last Updated On: March 14th, 2020] [Originally Added On: March 14th, 2020]
The Turing Test is Dead. Long Live The Lovelace Test - Walter Bradley Center for Natural and Artificial Intelligence [Last Updated On: April 8th, 2020] [Originally Added On: April 8th, 2020]
QuickBooks is still the gold standard for small business accounting. Learn how it's done now. - The Next Web [Last Updated On: April 19th, 2020] [Originally Added On: April 19th, 2020]
This A.I. makes up gibberish words and definitions that sound astonishingly real - Digital Trends [Last Updated On: May 17th, 2020] [Originally Added On: May 17th, 2020]
The Hardware in Microsofts OpenAI Supercomputer Is Insane - ENGINEERING.com [Last Updated On: June 5th, 2020] [Originally Added On: June 5th, 2020]
Why the buzz around DeepMind is dissipating as it transitions from games to science - CNBC [Last Updated On: June 5th, 2020] [Originally Added On: June 5th, 2020]
AlphaGo (2017) - Rotten Tomatoes [Last Updated On: June 5th, 2020] [Originally Added On: June 5th, 2020]
AlphaGo - Top Documentary Films [Last Updated On: June 5th, 2020] [Originally Added On: June 5th, 2020]
Enterprise hits and misses - contactless payments on the rise, equality on the corporate agenda, and Zoom and Slack in review - Diginomica [Last Updated On: June 8th, 2020] [Originally Added On: June 8th, 2020]
Is Dystopian Future Inevitable with Unprecedented Advancements in AI? - Analytics Insight [Last Updated On: June 26th, 2020] [Originally Added On: June 26th, 2020]
Test your Python skills with these 10 projects - Best gaming pro [Last Updated On: October 3rd, 2020] [Originally Added On: October 3rd, 2020]
In the Know - UCI News [Last Updated On: October 3rd, 2020] [Originally Added On: October 3rd, 2020]
How to Understand if AI is Swapping Civilization - Analytics Insight [Last Updated On: October 3rd, 2020] [Originally Added On: October 3rd, 2020]
Investing in Artificial Intelligence (AI) - Everything You Need to Know - Securities.io [Last Updated On: November 2nd, 2020] [Originally Added On: November 2nd, 2020]
An AI winter may be inevitable. What we should fear more: an AI ice age - ITProPortal [Last Updated On: December 4th, 2020] [Originally Added On: December 4th, 2020]
Are Computers That Win at Chess Smarter Than Geniuses? - Walter Bradley Center for Natural and Artificial Intelligence [Last Updated On: December 4th, 2020] [Originally Added On: December 4th, 2020]
What are proteins and why do they fold? - DW (English) [Last Updated On: December 12th, 2020] [Originally Added On: December 12th, 2020]
Are we ready for bots with feelings? Life Hacks by Charles Assisi - Hindustan Times [Last Updated On: December 12th, 2020] [Originally Added On: December 12th, 2020]
Examining the world through signals and systems - MIT News [Last Updated On: February 10th, 2021] [Originally Added On: February 10th, 2021]
How AI is being used for COVID-19 vaccine creation and distribution - TechRepublic [Last Updated On: April 24th, 2021] [Originally Added On: April 24th, 2021]
The 13 Best Deep Learning Courses and Online Training for 2021 - Solutions Review [Last Updated On: April 24th, 2021] [Originally Added On: April 24th, 2021]
Why AI That Teaches Itself to Achieve a Goal Is the Next Big Thing - Harvard Business Review [Last Updated On: April 24th, 2021] [Originally Added On: April 24th, 2021]
The Alpha of 'Go'. What is AlphaGo? | by Christopher Golizio | Apr, 2021 | Medium - Medium [Last Updated On: April 24th, 2021] [Originally Added On: April 24th, 2021]
How will Edge Artificial Intelligence (AI) Chips Take IoT Devices to the Next Level - Enterprise Apps Today [Last Updated On: July 6th, 2022] [Originally Added On: July 6th, 2022]
Machines with Minds? The Lovelace Test vs. the Turing Test - Walter Bradley Center for Natural and Artificial Intelligence [Last Updated On: July 6th, 2022] [Originally Added On: July 6th, 2022]
For AI to Be Creative, Here's What It Would Take - Discovery Institute [Last Updated On: July 6th, 2022] [Originally Added On: July 6th, 2022]
What is my chatbot thinking? Nothing. Here's why the Google sentient bot debate is flawed - Diginomica [Last Updated On: August 7th, 2022] [Originally Added On: August 7th, 2022]
Incoherent, creepy and gorgeous: we asked six leading artists to make work using AI and here are the results - The Guardian [Last Updated On: December 4th, 2022] [Originally Added On: December 4th, 2022]
Top 5 Applications of Reinforcement Learning in Real-Life - Analytics Insight [Last Updated On: December 4th, 2022] [Originally Added On: December 4th, 2022]
OpenAI tweaks ChatGPT to avoid dangerous AI information - The Register [Last Updated On: December 4th, 2022] [Originally Added On: December 4th, 2022]
Go champion who faced off against Google's AlphaGo says the rise of AI strips the games of artistry - DIGITIMES [Last Updated On: April 4th, 2024] [Originally Added On: April 4th, 2024]

Written by admin |

November 2nd, 2020 at 1:56 am

Posted in Alphago

What the hell is reinforcement learning and how does it work? – The Next Web

Pages

Categories

Partners

Recommended Resources

Archives