国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

COMP9414代做、代寫Python程序設(shè)計(jì)

時(shí)間:2024-07-21  來(lái)源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 24 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:
1https://www.gymlibrary.dev/environments/toy text/taxi/
1
env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 14 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.

請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp





 

掃一掃在手機(jī)打開(kāi)當(dāng)前頁(yè)
  • 上一篇:COMP9021代做、代寫python設(shè)計(jì)程序
  • 下一篇:COMP6008代做、代寫C/C++,Java程序語(yǔ)言
  • 無(wú)相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科技CAE仿真
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科
    CAE仿真分析代做公司 CFD流體仿真服務(wù) 管路流場(chǎng)仿真外包
    CAE仿真分析代做公司 CFD流體仿真服務(wù) 管路
    流體CFD仿真分析_代做咨詢服務(wù)_Fluent 仿真技術(shù)服務(wù)
    流體CFD仿真分析_代做咨詢服務(wù)_Fluent 仿真
    結(jié)構(gòu)仿真分析服務(wù)_CAE代做咨詢外包_剛強(qiáng)度疲勞振動(dòng)
    結(jié)構(gòu)仿真分析服務(wù)_CAE代做咨詢外包_剛強(qiáng)度疲
    流體cfd仿真分析服務(wù) 7類仿真分析代做服務(wù)40個(gè)行業(yè)
    流體cfd仿真分析服務(wù) 7類仿真分析代做服務(wù)4
    超全面的拼多多電商運(yùn)營(yíng)技巧,多多開(kāi)團(tuán)助手,多多出評(píng)軟件徽y1698861
    超全面的拼多多電商運(yùn)營(yíng)技巧,多多開(kāi)團(tuán)助手
    CAE有限元仿真分析團(tuán)隊(duì),2026仿真代做咨詢服務(wù)平臺(tái)
    CAE有限元仿真分析團(tuán)隊(duì),2026仿真代做咨詢服
    釘釘簽到打卡位置修改神器,2026怎么修改定位在范圍內(nèi)
    釘釘簽到打卡位置修改神器,2026怎么修改定
  • 短信驗(yàn)證碼 豆包網(wǎng)頁(yè)版入口 破天一劍 目錄網(wǎng) 排行網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看
    国产精品免费在线播放| 久久精品国产2020观看福利| 国产va免费精品高清在线 | 欧美日韩国产精品一区二区| 国产激情在线观看视频| 亚洲国产日韩综合一区| 国产精品一区在线播放| 久久国产精品久久精品| 黄色网络在线观看| 久久久成人精品| 青青草精品毛片| 日日摸夜夜添一区| 欧美性大战久久久久| 播播国产欧美激情| 欧美影院久久久| 久久精品在线视频| 欧美激情视频一区二区三区| 久久精品国产亚洲7777| 欧美精品一区免费| 国产精品毛片一区视频| 黑人中文字幕一区二区三区| 国产精品久久久久久久久| 国模视频一区二区三区| 国产精品欧美激情在线播放| 精品1区2区| 精品国产_亚洲人成在线 | 国产精品视频成人| 国产中文字幕乱人伦在线观看 | 欧美日韩电影一区二区三区| 国产精品久久久久久久一区探花| 免费无遮挡无码永久视频| 不卡伊人av在线播放| 国产美女无遮挡网站| 中文字幕在线亚洲精品| 91国产一区在线| 日本一区二区三区在线视频| 久久精品国产精品国产精品污 | 久久艹国产精品| 欧美激情视频一区二区三区| 欧美成人中文字幕在线| 高清欧美精品xxxxx| 午夜免费电影一区在线观看| 日韩在线视频国产| 国产综合香蕉五月婷在线| 亚洲综合在线播放| 日韩专区在线观看| 国产日韩欧美夫妻视频在线观看| 欧美激情精品久久久久久| 91精品久久久久久蜜桃| 日韩精品欧美专区| 精品国产一区二区三区四区vr| 97精品国产97久久久久久| 日韩欧美亚洲日产国| 久久亚洲国产精品成人av秋霞| 97久久精品午夜一区二区| 人禽交欧美网站免费| 久久国产精品影视| 久久精品视频91| 国产亚洲欧美在线视频| 午夜久久资源| 国产精品美乳在线观看| 91国产在线播放| 欧美亚洲成人免费| 永久久久久久| 日韩视频在线一区| 91久久在线视频| 精品欧美一区二区久久久伦 | 国产女人18毛片| 日本精品免费| 久久久久国产精品免费| 日韩一区二区精品视频| 99在线看视频| 免费一区二区三区| 日韩欧美在线一区二区| 精品综合久久久久久97| 日韩在线观看免费网站| www.日本少妇| 黄色一级视频播放| 日本精品视频网站| 亚洲精品国产suv一区88| 国产精品二区在线| 日韩在线观看免费网站| 97精品免费视频| 国产一区二区视频在线观看 | 九九精品在线播放| 国产精品欧美亚洲777777| 国产精品12p| 国产精品一区电影| 精品一区二区三区免费毛片| 欧美一级片一区| 亚洲一卡二卡| 精品成在人线av无码免费看| 国产精品偷伦一区二区| 久久福利电影| 77777亚洲午夜久久多人| 国产精品一色哟哟| 国产美女精品免费电影| 黄网站色视频免费观看| 欧美国产综合视频| 欧美在线观看网址综合| 日本一区二区不卡高清更新| 亚洲成人一区二区三区| 伊甸园精品99久久久久久| 两个人的视频www国产精品| 久久99精品久久久久子伦| 91精品国产色综合| av资源站久久亚洲| 成人国内精品久久久久一区| 精品一区二区视频| 国产综合免费视频| 韩国国内大量揄拍精品视频| 欧美日韩一区在线播放| 欧美亚洲精品日韩| 黄色91av| 精品无人乱码一区二区三区的优势 | 狠狠干一区二区| 欧美极品欧美精品欧美图片| 欧美不卡在线一区二区三区| 欧美成人精品免费| 国产一区视频免费观看| 国产午夜福利视频在线观看| 精品少妇在线视频| 国产淫片av片久久久久久| 欧美日韩亚洲第一| 国内少妇毛片视频| 精品一区二区三区视频日产| 狠狠干 狠狠操| 男女猛烈激情xx00免费视频| 欧美二区在线视频| 国内一区在线| 韩日精品中文字幕| 国产视频九色蝌蚪| 国产精品亚洲综合| 99热亚洲精品| 久久久精品在线视频| 久久av喷吹av高潮av| 久久久久久久久中文字幕| www亚洲精品| 国产精品国产精品国产专区不卡 | 精品视频在线观看| 国产欧美日韩专区发布| 国产日韩欧美日韩大片| 国产麻豆一区二区三区在线观看| 国产剧情日韩欧美| 97国产在线观看| 久久男人资源视频| 日韩网站免费观看| 国产精品久久久久久搜索| 精品成在人线av无码免费看| 亚洲天堂电影网| 日本一级淫片演员| 国内自拍欧美激情| 国产日韩欧美综合| av在线com| 色偷偷av一区二区三区| 国产精品国产三级国产专区53| 久久国产色av| 日韩av大片在线| 欧美日韩一区二区在线免费观看| 国产欧美日韩在线播放| 久久亚洲国产精品日日av夜夜| 色久欧美在线视频观看| 久久天天躁狠狠躁夜夜躁2014| 欧美日韩国产成人在线| 日本一区二区三区视频免费看| 欧美激情第一页在线观看| 国产麻豆一区二区三区在线观看| 91久久久亚洲精品| 久久久久久一区| 欧美成人一区在线| 日本电影亚洲天堂| 国产日韩久久| 国产v综合v亚洲欧美久久| 国产精品国产三级国产专区51| 亚洲一区尤物| 黄色一级视频播放| 久久五月天婷婷| 麻豆国产精品va在线观看不卡| 日韩一区二区三区高清| 美日韩精品免费| 国产超碰91| 综合色婷婷一区二区亚洲欧美国产 | 日本高清不卡一区二区三| 蜜桃视频日韩| 91国内在线视频| 精品国产二区在线| 日韩免费一区二区三区| 国产欧美综合一区| 久久久久久久久久亚洲| 欧美精品videofree1080p| 日韩精品福利片午夜免费观看| 成人福利网站在线观看| 久久精品久久精品亚洲人| 亚洲欧洲久久| 欧美日韩一区二区视频在线观看| 91精品免费视频| 久久99精品国产99久久6尤物| 青青在线视频观看| 777国产偷窥盗摄精品视频| 精品久久蜜桃|