国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

ECE 498代寫、代做Python設(shè)計編程
ECE 498代寫、代做Python設(shè)計編程

時間:2024-11-15  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:IEMS5731代做、代寫java設(shè)計編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科技CAE仿真
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科
    CAE仿真分析代做公司 CFD流體仿真服務(wù) 管路流場仿真外包
    CAE仿真分析代做公司 CFD流體仿真服務(wù) 管路
    流體CFD仿真分析_代做咨詢服務(wù)_Fluent 仿真技術(shù)服務(wù)
    流體CFD仿真分析_代做咨詢服務(wù)_Fluent 仿真
    結(jié)構(gòu)仿真分析服務(wù)_CAE代做咨詢外包_剛強(qiáng)度疲勞振動
    結(jié)構(gòu)仿真分析服務(wù)_CAE代做咨詢外包_剛強(qiáng)度疲
    流體cfd仿真分析服務(wù) 7類仿真分析代做服務(wù)40個行業(yè)
    流體cfd仿真分析服務(wù) 7類仿真分析代做服務(wù)4
    超全面的拼多多電商運(yùn)營技巧,多多開團(tuán)助手,多多出評軟件徽y1698861
    超全面的拼多多電商運(yùn)營技巧,多多開團(tuán)助手
    CAE有限元仿真分析團(tuán)隊,2026仿真代做咨詢服務(wù)平臺
    CAE有限元仿真分析團(tuán)隊,2026仿真代做咨詢服
    釘釘簽到打卡位置修改神器,2026怎么修改定位在范圍內(nèi)
    釘釘簽到打卡位置修改神器,2026怎么修改定
  • 短信驗(yàn)證碼 寵物飼養(yǎng) 十大衛(wèi)浴品牌排行 suno 豆包網(wǎng)頁版入口 wps 目錄網(wǎng) 排行網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看
    久草热视频在线观看| 久久久久久久有限公司| 久久久精品电影| 欧美精品在线一区| 国产精品高潮呻吟久久av野狼| 国精产品99永久一区一区| 国产成人精品午夜| 国产日韩亚洲精品| 日本午夜在线亚洲.国产| 久久精品99无色码中文字幕| 日韩中文在线中文网三级| 国产伦精品一区二区三区高清版| 五月天综合婷婷| 国产精品视频地址| 91精品国产91久久久久青草| 日本免费不卡一区二区| 免费不卡在线观看av| 91免费版网站入口| 欧美做受高潮1| 一区二区在线高清视频| 国产成人综合av| 精品91免费| 亚州精品天堂中文字幕| 日韩中文在线字幕| 一区二区视频国产| 日韩在线三区| 免费精品视频一区二区三区| 日本一区二区高清视频| 人人妻人人澡人人爽精品欧美一区| 久久国产天堂福利天堂| 九九九久久久| 久久精品久久久久久| 欧美日韩成人免费| 国产精品福利小视频| 欧美激情久久久久久| 日韩av大片在线| 欧美激情 国产精品| 国产精品一区二区久久国产| 久久久久国产精品视频| 国产精品视频久| 亚洲综合av影视| 欧美日韩高清区| 欧美一区二区三区综合| 极品粉嫩国产18尤物| 9a蜜桃久久久久久免费| 国产欧美日韩免费| 国产中文字幕91| 久久久爽爽爽美女图片| 99久久精品久久久久久ai换脸 | 国产成人精品视| 国产精品狠色婷| 日本精品国语自产拍在线观看| 精品少妇人欧美激情在线观看| 国产成年人在线观看| 国产高清在线一区二区| 久久av红桃一区二区小说| 日本一区免费观看| www.亚洲一区二区| 国产精品久久久久久久久久久新郎 | 精品国产一区三区| 日本国产中文字幕| 成人精品一区二区三区电影黑人 | 亚洲一区在线免费| 欧美日韩成人网| 三区精品视频| 不卡一区二区三区视频| 久久综合色88| 欧美日韩激情四射| 免费久久久一本精品久久区| 国产成人亚洲精品| 懂色av一区二区三区在线播放| 国产区一区二区| 国产精品二区三区| 欧美视频免费看欧美视频| 久久九九视频| 天堂一区二区三区| 国产高清视频一区三区| 午夜精品久久久久久久白皮肤| 国产日韩欧美二区| 精品国产无码在线| 国产一级黄色录像片| 国产精品久久久久影院日本| 僵尸世界大战2 在线播放| 日韩中文字幕第一页| 人人做人人澡人人爽欧美| 国产成人在线精品| 日本久久久久久| 国产av天堂无码一区二区三区| 色综合666| 国产v亚洲v天堂无码久久久| 日本一区二区在线免费播放| 国产精欧美一区二区三区| 亚洲 中文字幕 日韩 无码| 午夜欧美大片免费观看| 国产免费内射又粗又爽密桃视频| 欧美理论电影在线观看| 国产免费一区视频观看免费| 国产成人精品电影| 狠狠色综合一区二区| 日韩中文字幕网| 免费在线观看毛片网站| 国产精品黄色影片导航在线观看| 国产在线观看91精品一区| 色综合久久88色综合天天看泰| 国产一区二区网| 九九久久国产精品| 91精品在线观看视频| 岛国视频一区免费观看| 久久日韩精品| 欧美主播一区二区三区美女 久久精品人 | 久久久久天天天天| 女同一区二区| 欧美日韩国产成人在线| 国产精品96久久久久久| 久久av免费观看| 日韩精品无码一区二区三区免费| 日韩中文字幕网站| 免费av一区二区三区| 欧美激情小视频| 久久精品一区二区三区不卡免费视频| 三级网在线观看| 久久精品人人爽| 国产精选在线观看91| 欧美一级视频在线观看| 久久视频国产精品免费视频在线| 国产日韩欧美中文在线播放| 视频一区二区在线观看| 国产精品美女呻吟| 色综合电影网| 一区不卡视频| 91av免费看| 欧洲精品久久久| 欧美激情一级欧美精品| 国产mv免费观看入口亚洲| 精品免费一区二区三区蜜桃| 国产精品高清免费在线观看| 99在线观看| 黄色片视频在线播放| 亚洲最大福利网站| 久久精品国产亚洲精品| 99精品99久久久久久宅男| 欧美一区深夜视频| 亚洲视频欧美在线| 国产精品久久久久久久久久久久午夜片 | 国产综合动作在线观看| 视频在线99| 久久国产精品视频| 日韩有码视频在线| 91精品视频在线| 国产在线精品一区免费香蕉| 日韩精品一区二区三区电影| 亚洲欧美成人一区| 欧美精品在线极品| 日韩亚洲成人av在线| 99久久无色码| 国产欧美一区二区三区另类精品| 奇米精品一区二区三区| 亚州成人av在线| 欧美巨大黑人极品精男| 久久精品国亚洲| 国产不卡在线观看| 91精品久久久久久久久青青| 国产裸体免费无遮挡| 精品免费视频123区| 日韩免费在线看| 日本中文字幕成人| 亚洲一区二区三区四区中文| 色综合久久悠悠| 精品久久久久久一区二区里番 | 欧美成人第一页| 久久久精品欧美| 久久国产手机看片| 久久天天东北熟女毛茸茸| 成人av在线不卡| 国产免费观看久久黄| 国产亚洲精品网站| 国产在线观看欧美| 美媛馆国产精品一区二区| 黄色网页免费在线观看| 欧美日韩国产三区| 欧美一二三不卡| 免费在线观看一区二区| 欧美激情一区二区三区在线视频| 欧美午夜视频在线| 欧美视频免费播放| 精品一卡二卡三卡四卡日本乱码| 欧美性在线视频| 激情小说网站亚洲综合网| 黄色一级片播放| 国产日韩中文字幕| 成人久久精品视频| 99se婷婷在线视频观看| 国产精品av在线播放| 久久av综合网| 色噜噜狠狠狠综合曰曰曰88av| 日韩在线高清视频| 国产精品欧美久久| 精品国产一区二区三| 亚洲永久在线观看| 日韩亚洲不卡在线|