国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

COMP9414代做、代寫Python程序設(shè)計(jì)

時(shí)間:2024-07-21  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 24 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:
1https://www.gymlibrary.dev/environments/toy text/taxi/
1
env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 14 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.

請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp





 

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:COMP9021代做、代寫python設(shè)計(jì)程序
  • 下一篇:COMP6008代做、代寫C/C++,Java程序語言
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科技CAE仿真
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科
    CAE仿真分析代做公司 CFD流體仿真服務(wù) 管路流場(chǎng)仿真外包
    CAE仿真分析代做公司 CFD流體仿真服務(wù) 管路
    流體CFD仿真分析_代做咨詢服務(wù)_Fluent 仿真技術(shù)服務(wù)
    流體CFD仿真分析_代做咨詢服務(wù)_Fluent 仿真
    結(jié)構(gòu)仿真分析服務(wù)_CAE代做咨詢外包_剛強(qiáng)度疲勞振動(dòng)
    結(jié)構(gòu)仿真分析服務(wù)_CAE代做咨詢外包_剛強(qiáng)度疲
    流體cfd仿真分析服務(wù) 7類仿真分析代做服務(wù)40個(gè)行業(yè)
    流體cfd仿真分析服務(wù) 7類仿真分析代做服務(wù)4
    超全面的拼多多電商運(yùn)營(yíng)技巧,多多開團(tuán)助手,多多出評(píng)軟件徽y1698861
    超全面的拼多多電商運(yùn)營(yíng)技巧,多多開團(tuán)助手
    CAE有限元仿真分析團(tuán)隊(duì),2026仿真代做咨詢服務(wù)平臺(tái)
    CAE有限元仿真分析團(tuán)隊(duì),2026仿真代做咨詢服
    釘釘簽到打卡位置修改神器,2026怎么修改定位在范圍內(nèi)
    釘釘簽到打卡位置修改神器,2026怎么修改定
  • 短信驗(yàn)證碼 寵物飼養(yǎng) 十大衛(wèi)浴品牌排行 suno 豆包網(wǎng)頁版入口 wps 目錄網(wǎng) 排行網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看
    欧美 日韩 亚洲 一区| 久久在线免费观看视频| 国产精品福利在线观看| 国产suv精品一区二区三区88区| 高清欧美性猛交| 国产尤物91| 日韩av黄色网址| 亚洲中文字幕无码不卡电影| 久国内精品在线| 国产99视频在线观看| 久久精品日韩| 久久人人97超碰人人澡爱香蕉| 久久综合九色综合88i| 国产成人极品视频| 日韩专区中文字幕| 国产精品久久久久久久免费大片| 国产精品乱码视频| 9191国产视频| 91久久夜色精品国产网站| 91av一区二区三区| 国产ts人妖一区二区三区| 国产成人无码一二三区视频| 久久精品中文字幕| 91精品国产综合久久久久久丝袜 | 日本一区二区三不卡| 国产精品欧美激情在线观看| 国产精品嫩草影院久久久| 国产精彩精品视频| 久久久久久久久久av| 91精品国产777在线观看| 欧美久久电影| 天堂资源在线亚洲视频| 久久夜色精品国产亚洲aⅴ| 精品久久久久久无码国产| 在线观看免费黄色片| 日本久久久久亚洲中字幕| 欧美xxxx黑人又粗又长密月| 成人精品久久久| 国产拍精品一二三| 91精品在线观| 国产精品高精视频免费| 亚洲精品国产精品国自产| 欧美伦理91i| 一区二区免费在线视频| 人人干视频在线| 国产日韩亚洲欧美在线| 欧美精品卡一卡二| 国产欧美日韩中文字幕在线| 欧美精品久久| 日本阿v视频在线观看| 国产在线一区二区三区播放| 国产成人激情视频| 久久精品第九区免费观看| 国产精品入口免费视| 亚洲精品一区二| 国产在线精品自拍| 久久久久人妻精品一区三寸| 久久久亚洲影院| 91精品国产自产91精品| 国产精品日本精品| 国产精品久久久久久av福利| 亚洲欧洲日本国产| 狠狠色综合网站久久久久久久| 国产精品69久久久久| 国产经典一区二区三区| 国产精品国产三级欧美二区| 日韩.欧美.亚洲| 日韩久久久久久久久久久久| 日本欧美黄网站| 国产欧美日韩精品专区| 日韩亚洲精品电影| 久久久国产91| 欧美一级片免费播放| 欧美在线视频观看免费网站| 91麻豆精品秘密入口| 精品国产综合区久久久久久| 精品人伦一区二区三区| 色噜噜狠狠色综合网图区 | 日韩精品无码一区二区三区免费| 日本一区二区在线播放| 国产日韩精品一区二区| 国产精品久久久久影院日本| 欧美人成在线视频| 欧美日韩一区二区三区在线视频 | 亚洲精品国产精品国自产| 色综合视频二区偷拍在线| 国产欧美日韩综合一区在线观看| 国产精品美女久久久免费| 一区二区三区四区国产| 亚洲AV无码成人精品一区| 日韩精品免费一区| 欧美久久电影| 色伦专区97中文字幕| 一区二区三区一级片| 亚洲黄色网址在线观看| 日韩免费一级视频| 国产成人精品电影| 国产精品免费视频一区二区| 国产精品人人做人人爽| 欧美日韩国产不卡在线看| 日本成人在线不卡| 91精品久久久久久久久| 国产精品区免费视频| 欧美亚洲日本黄色| 久久久成人精品视频| 亚洲成色www久久网站| 青青视频免费在线观看| 久久艹国产精品| 欧美精品电影在线| 国产女主播自拍| 亚洲最大福利网站| 国产极品尤物在线| 国产精品入口免费视| 女女同性女同一区二区三区91| 福利视频一二区| 色噜噜狠狠狠综合曰曰曰| 人偷久久久久久久偷女厕| 国产成人欧美在线观看| 狠狠色综合欧美激情| 欧美精品在线观看| 91九色偷拍| 欧美精品久久久久久久免费| 久久男人资源视频| 另类色图亚洲色图| 日本中文字幕成人| 国产乱码精品一区二区三区不卡| 自拍另类欧美| 国模精品一区二区三区| 国产成人+综合亚洲+天堂| 日韩免费观看高清| 国产精品9999| 欧美精品与人动性物交免费看| 久久久久国产精品熟女影院| 美女扒开尿口让男人操亚洲视频网站| 国产精品自产拍在线观看中文| 欧美一区二区三区图| 国产精品视频入口| 91精品综合视频| 黄色国产小视频| 国产精品嫩草影院久久久| 国产伦精品一区二区三区精品视频| 婷婷久久青草热一区二区| 久久视频这里只有精品| 欧美性视频在线播放| 国产福利一区视频| 国模吧无码一区二区三区| 天天爱天天做天天操| 不卡av电影院| 久久精品国产99精品国产亚洲性色 | 亚洲永久一区二区三区在线| 国产淫片免费看| 国产精品入口日韩视频大尺度| 俄罗斯精品一区二区三区| 欧美精品久久久久a| 国产成人精品一区二区三区| 欧美日韩亚洲一| 国产精品视频1区| 成人精品久久av网站| 欧美极品色图| 日韩中文一区| 一区二区三区av| 成人444kkkk在线观看| 国产女同一区二区| 日本精品久久中文字幕佐佐木| 久久久久久18| 久久天天东北熟女毛茸茸| 国产日韩欧美在线看| 欧美影院久久久| 亚洲综合在线做性| 久久在线免费观看视频| 久久精品成人欧美大片古装| 国内自拍中文字幕| 日本欧美一级片| 国产精品视频网址| 久久久噜噜噜www成人网| 91av在线精品| 91传媒视频免费| 欧美一区深夜视频| 日本一区高清不卡| 欧美一区二区激情| 久久香蕉国产线看观看av| 久久久久久久久久久久久久国产| 成人毛片网站| 国产日韩av在线| 无码内射中文字幕岛国片| 日韩最新免费不卡| 久久精品美女| 国产二区视频在线播放| 国产特级黄色大片| 美媛馆国产精品一区二区| 免费高清在线观看免费| 精品一区二区三区国产| 国模吧一区二区三区| 色一情一乱一乱一区91| 国产精品免费久久久久影院| www国产91| 97欧美精品一区二区三区| 欧洲精品久久| 不卡av电影在线观看| 国产精品久久国产精品99gif |