国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

代做CS 259、Java/c++設(shè)計(jì)程序代寫
代做CS 259、Java/c++設(shè)計(jì)程序代寫

時(shí)間:2024-10-12  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



Fall 2024 
CS 259 Lab 1 
Accelerating Convolutional Neural Network (CNN) on FPGAs using 
Merlin Compiler 
Due October 9 11:59pm 
Description 
Your task is to accelerate the computation of two layers in a convolutional neural network 
(CNN) using a high-level synthesis (HLS) tool on an FPGA. We encourage you to start with 
using the Merlin Compiler. For an input image with 228 × 228 pixels and 256 channels, you 
are going to calculate the tensor after going through a 2D convolution layer and a 2D max 
pooling layer. The convolution layer has 256 filters of shape 256 × 5 × 5, uses the ReLU 
activation relu(x) = max{x, 0} with a bias value for each output channel. The 2D maxpooling
 layer operates on 2 × 2 non-overlapping windows. You will need to implement this 
function using HLS: 
void CnnKernel(const float* input, const float* weight, const float* bias, float* 
output) 
where input is the input image of size [256][228][228], weight stores the weights of the 
convolution filters of size [256][256][5][5], bias stores the offset values of size [256] that 
will be added to the output channels, and output should be written to by you as defined 
above to store the result of maxpool(relu(conv2d(input, weight) + bias)). The output 
size is [256][112][112]. 
How-To 
FPGA accelerator compilation typically involves three (3) stages: high-level synthesis (HLS), 
bitstream generation, and onboard execution. The last two stages can take days to 
complete. Therefore, in this lab, we only focus on the first stage: HLS. Your performance will 
only be assessed using the estimation in the HLS reports, which is usually accurate. 
However, you are welcome to try out the last two steps if you are interested. 
 
 
 
Connecting to the Server: Method 1 
In this method, you won’t be able to run Merlin directly from your /home directory, so you’ll need 
to copy files back and forth. 
1. Connect to the server (VPN may be required). You can find VPN details here: 
https://www.it.ucla.edu/it-support-center/services/virtual-private-network-vpn-clients  
ssh <username>@brimstone.cs.ucla.edu 
 
2. Start the Docker container and share your home with –v: 
 
docker run -v /d0/class/:/home -it vitis2021 /bin/bash 
 
3. Source Vitis, navigate to the desired directory and clone the repository: 
 
source /tools/Xilinx/Vitis_HLS/2021.1/settings64.sh 
cd /opt 
git clone https://github.com/UCLA-VAST/cs-259-f24.git 
cd cs-259-f24/lab1 
 
4. Copy the necessary file to your home directory: 
 
cp /opt/cs-259-f24/lab1/cnn-krnl.cpp /home/<username> 
Connecting to the Server: Method 2 
In this method, you can run Merlin directly from your /home directory, but make sure to export your 
home directory. 
 
1. Connect to the server (VPN may be required). You can find VPN details here: 
https://www.it.ucla.edu/it-support-center/services/virtual-private-network-vpn-clients 
 
ssh <username>@brimstone.cs.ucla.edu 
 
2. Start the Docker container and share your home with –v: 
 
docker run --user $(id -u):100 -v /d0/class/:/home -it vitis2021 /bin/bash 
 
3. Export your home directory: 
 
export HOME=/home/<username> 
 
4. Source Vitis, navigate to your home directory and clone the repository: 
 
source /tools/Xilinx/Vitis_HLS/2021.1/settings64.sh 
cd /home/<username> 
git clone https://github.com/UCLA-VAST/cs-259-f24.git 
cd cs-259-f24/lab1 
Build and Run Baseline with Software Simulation 
We have prepared the starter kit for you. Please run: make 
This command will perform a software simulation of the provided starter FPGA HLS kernel. It 
should show “PASS”. You need to use FPGA Developer AMI in this lab unless you are using 
a computer with Xilinx Vitis HLS installation. However, you are still suggested to develop code 
and run software simulation locally to test the correctness. You can move to AWS once you 
enter the tuning stage. 
Understand the automatic Merlin’s optimization 
Before modifying the kernel and adding pragmas, synthesize the CNN kernel with Merlin and 
describe in your report the automatic optimizations made by Merlin and how this reduces 
latency. 
Modify the HLS CNN Kernel 
If you have successfully built and run the baseline HLS CNN kernel, you can now optimize 
the code to design your CNN kernel. Your task is to implement a fast, parallel version of the 
CNN kernel on FPGA. You should start with the provided starter kit. You should edit cnnkrnl.cpp
for this task. When editing, please use the given types input_t, weight_t, bias_t, 
and output_t for the corresponding data, and compute_t for your intermediate values. 
You can use them as if they are float numbers. 
Parallelism should be exploited by using Merlin pragmas and tiling. You are encouraged to 
focus on Merlin pragmas (#pragma ACCEL parallel, #pragma ACCEL pipeline and #pragma 
ACCEL tile). You can explicitly modify the code (tiling, loop permutation, …) but make sure 
the code modified is correct. 
In the starter kit, we simply wrap a sequential CNN code with #pragma ACCEL kernel, and 
Merlin automatically performs data caching, memory coalescing, pipelining and 
parallelization, which yield about 10 GFLOPs. 
Although the skeleton kernel is provided, you are also free to create your own by removing 
the header file inclusion of “lib/cnn-krnl.h” and implement the basic kernel from scratch. 
However, this would require specific expertise in Xilinx FPGA architecture and is not 
recommended for this course. 
Test Your HLS CNN Kernel with Software Simulation 
To perform software emulation of your FPGA implementation of CNN kernel: 
make 
If you see something similar to the following message, your implementation is incorrect. 
Found 21201** errors 
FAIL Since the software simulation step uses the CPU to emulate the hardware behavior, it only 
serves as correctness test and its execution time doesn’t reflect that of actual hardware. Your 
estimated execution time should be retrieved using the command below: 
make estimate 
This command will print out the estimated latency and resource usage of your kernel: 
+---------------------------+------------------------+----------+----------+---------+--------+-------+------+ 
| Kernel | Cycles | LUT | FF | BRAM | DSP | URAM |Detail| 
+---------------------------+------------------------+----------+----------+---------+--------+-------+------+ 
|CnnKernel (cnn-krnl.cpp:12)|4179564052 (16718.256ms)|49558 (4%)|49381 (2%)|810 (18%)|202 (2%)|25 (2%)|- | 
+---------------------------+------------------------+----------+----------+---------+--------+-------+------+ 
The time highlighted in yellow is the estimated execution time of your FPGA kernel. You can 
get the performance by “kNum*kNum*kImSize*kImSize*kKernel*kKernel*2/latency”, or 
164.4/latency (in s) to get the performance in GFLOPS. 
IMPORTANT: Please make sure that all your loops have fixed loop bounds. If any of the loop 
bounds are variable, a performance estimation will not be shown and you will receive no 
performance grade. 
IMPORTANT: The “make estimate” command should finish in 30 minutes, or in two hours 
with highly-complex optimizations. Our recommendation is to halt your estimation using 
Ctrl-C when the time exceeds 30 minutes, except for your last step (after you reach ~100 
GOPS). More than 12 hours in the estimation will result in zero for the performance score. 
As your kernel design becomes more complex, the software simulation and the estimation 
will start to take a significantly longer time. 
IMPORTANT: As you apply more optimizations, your resource usage will also increase. 
Ideally, you should keep applying optimization until your kernel occupies about 80% of these 
resources. The remaining 20% should be reserved for the interfaces (DRAM/PCI-e controller) 
and the downstream flows. Please make sure that resource utilization is less than 80% for all 
FPGA resources. If any of the resources are over this limit, you will receive no performance 
grade. 
IMPORTANT: You can check the HLS report by opening merlin.rpt with a text editor. This 
file will be generated with the command make estimate. You must submit this file with your 
final submission. You should not modify this file in your submission, and it will be all verified 
after submission due. Any modification to this file in your submission constitutes academic 
misconduct and will be reported. 
Advanced Tips for HLS 
Kernel Profiling: If you want to “profile” your kernel, you can open merlin.rpt with a text 
editor and scroll down to Performance Estimate. You can see the trip count, accumulated 
cycles and cycles per call, as well as pipeline initiation interval and parallel factor for each 
loop in the table. For resource usage, you can go to Resource Estimate. No loop level 
information is available, though. If you want to check the resource usage of a code region, 
you can wrap it with a function then run again. 
Kernel after transformation: If you want to see the kernel after being transformed by Merlin, 
you can look for that in .merlin_prj/run/implement/exec/hls/kernel. Annotation for Profiling: If you find the loops in your report hard to read, you can name the 
loops you are interested in using a goto label. For example, this_loop: for (int i = 0; 
i < n; i++); 
Debugging Pipelining: If you are not sure about why you cannot achieve a specific initiation 
interval as you expected, you can open the file below and read the logs. HLS usually gives out 
a reason. 
.merlin_prj/run/implement/exec/hls/_x/logs/CnnKernel/CnnKernel/vitis_hls.log 
Long Synthesis Time In Pipelining: You will experience long HLS synthesis time (for 
generating the estimation) if you pipeline a loop with a large loop body. Besides, please note 
that as all loops inside a pipeline will be unrolled, it may be automatically a large loop body. 
In this case, you may want to exchange the order of pipelining and unrolling and see if the time 
can get improved. 
Use Functions for Shorter Synthesis Time: If you experience long synthesis time, you may try 
wrapping some loops into a function and specify #pragma HLS inline off inside the 
function body. However, this may lead to inaccurate dependency analysis or memory port 
analysis and cause lower performance sometimes. There might be some workarounds, or 
not. For example, if you have access to A[k + i][j] inside the function, passing A + k to 
the function and accessing A’[i][j] can allow HLS to understand the array partitioning 
better than passing A. You need to do experiments. 
General Tips 
● When you develop on AWS, to resume a session in case you lose your connection, you 
can run screen after login. You can recover your session with screen -DRR. You should 
stop your AWS instance if you are going to come back and resume your work in a few 
hours or days. Your data will be preserved but you will be charged for the EBS storage 
for $0.10 per GB per month (with default settings). You should terminate your instance 
if you are not going to come back and resume your work. Data on the instance will be 
lost. 
● You are recommended to use private repositories provided by GitHub to backup your 
code. Never put your code in a public repo to avoid potential plagiarism. To check in 
your code to a private GitHub repo, create a repo first. 
git branch -m upstream 
git checkout -b main # skip these two lines if you are reusing the folder in Lab 1 
... // your modifications 
git add cnn-krnl.cpp merlin.rpt 
git commit -m "lab1: first version" # change commit message accordingly 
# please replace the URL with your own URL 
git remote add origin git@github.com:YourGitHubUserName/your-repo-name.git 
git push -u origin main 
● You are recommended to git add and git commit often so that you can keep track of 
the history and revert whenever necessary. 
● Make sure your code produces correct results! 
(Optional) Modify the HLS CNN Kernel using Vitis Pragmas 
You are encouraged to use mainly Merlin pragmas. If needed, you can use Vitis pragmas for 
finer-grained control and optimization. The list of pragmas in Vitis can be found here. You can simply write Vitis pragmas and Merlin pragmas in the same file (cnn-krnl.cpp), but note 
that, to apply an HLS pragma to a loop, you need to put the pragma inside the loop body 
instead of before it. 
Submission 
You need to report the estimated performance results of your FPGA-based implementation on 
a Xilinx Ultrascale+ VU9P FPGA (the FPGA we are using, specified in the makefile). Please 
express your performance in GFLOPS and the speedup compared with the starter-kit version. 
Your report should also include: 
● Please run the input C file through the Merlin Compiler, identify the code 
transformation and HLS pragmas that Merlin added, and discuss why. 
● Please explain the parallelization and optimization strategies you have applied for 
each loop in the CNN program (convolution, max pooling, etc) in this lab. Include the 
pragmas (if any) or code segments you have added to achieve your strategy. 
● Please incrementally evaluate each parallelization/optimization that you have applied 
and explain why it improves the performance. 
● Please report the FPGA resources (LUT/FF/DSP/BRAM) usages, in terms of resource 
count and percentage of the total. Which resource has been used most, in terms of 
percentage? 
● Optional: The challenges you faced, and how you overcame them. 
● (Bonus +5pts): Analyze your code and check if the DSP/BRAM resource usage 
matches your expectation. Only the adders, multipliers, and size of arrays need to be 
considered. Please attach related code segments to your report and show how you 
computed the expected number. Provide a discussion on possible reasons if they 
differ significantly. 
You also need to submit your optimized kernel code. Do not modify code in the lib directory. 
Please submit on Gradescope. Your final submission should contain and only contain these 
files individually: 
 ├ cnn-krnl.cpp 
 ├ merlin.rpt 
 └ lab**report.pdf 
File lab**report.pdf must be in PDF format. 
Grading Policy 
Your submission will only be graded if it complies with the formatting requirements. 
Missing reports/code or compilation errors will result in 0 for the corresponding 
category(ies). 
Correctness (40%) 
Please check the correctness using the command “make”. Performance (40%) 
Your performance will be evaluated based on the estimation report generated using the 
command “make estimate”. The performance point will be added only if you have the 
correct result, so please prioritize the correctness over performance. Your performance will 
be evaluated based on the ranges of throughput (GOPS). Ranges A+ and A++ will be defined 
after all the submissions are graded: 
● Range A++, better than Range A+ performance: 40 points + 20 points (bonus) 
● Range A+, better than Range A performance: 40 points + 10 points (bonus) 
● Range A GFLOPS [200, 280]: 40 points 
● Range B GFLOPS [120, 200): 30 points 
● Range C GFLOPS [60, 120): 20 points 
● Range D GFLOPS [30, 60): 10 points 
● Lower than range F [0, 30): 0 points 
 
Report (20%) 
Points may be deducted if your report misses any of the sections described above. 
Academic Integrity 
All work is to be done individually, and any sources of help are to be explicitly cited. You must 
not modify the HLS report merlin.rpt in your submission. Any instance of academic 
dishonesty will be promptly reported to the Office of the Dean of Students. Academic 
dishonesty includes, but is not limited to, cheating, fabrication, plagiarism, copying code from 
other students or from the internet, modifying the software-generated report, or facilitating 
academic misconduct. We’ll use automated software to identify similar sections between 
different student programming assignments, against previous students’ code, or against 
Internet sources. We’ll run HLS on all submissions and compare the reproduced HLS 
report with the submitted report. Students are not allowed to post the lab solutions on public 
websites (including GitHub). Please note that any version of your submission must be your 
own work and will be compared with sources for plagiarism detection. 
Late policy: Late submission will be accepted for 24 hours with a 10% penalty. No late 
submission will be accepted after that (you lost all points after the late submission time). 

請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









 

掃一掃在手機(jī)打開當(dāng)前頁(yè)
  • 上一篇:代寫ECE4016、Python設(shè)計(jì)編程代做
  • 下一篇:DDA3020代做、代寫Python語(yǔ)言編程
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科技CAE仿真
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科
    CAE仿真分析代做公司 CFD流體仿真服務(wù) 管路流場(chǎng)仿真外包
    CAE仿真分析代做公司 CFD流體仿真服務(wù) 管路
    流體CFD仿真分析_代做咨詢服務(wù)_Fluent 仿真技術(shù)服務(wù)
    流體CFD仿真分析_代做咨詢服務(wù)_Fluent 仿真
    結(jié)構(gòu)仿真分析服務(wù)_CAE代做咨詢外包_剛強(qiáng)度疲勞振動(dòng)
    結(jié)構(gòu)仿真分析服務(wù)_CAE代做咨詢外包_剛強(qiáng)度疲
    流體cfd仿真分析服務(wù) 7類仿真分析代做服務(wù)40個(gè)行業(yè)
    流體cfd仿真分析服務(wù) 7類仿真分析代做服務(wù)4
    超全面的拼多多電商運(yùn)營(yíng)技巧,多多開團(tuán)助手,多多出評(píng)軟件徽y1698861
    超全面的拼多多電商運(yùn)營(yíng)技巧,多多開團(tuán)助手
    CAE有限元仿真分析團(tuán)隊(duì),2026仿真代做咨詢服務(wù)平臺(tái)
    CAE有限元仿真分析團(tuán)隊(duì),2026仿真代做咨詢服
    釘釘簽到打卡位置修改神器,2026怎么修改定位在范圍內(nèi)
    釘釘簽到打卡位置修改神器,2026怎么修改定
  • 短信驗(yàn)證碼 寵物飼養(yǎng) 十大衛(wèi)浴品牌排行 suno 豆包網(wǎng)頁(yè)版入口 wps 目錄網(wǎng) 排行網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看
    男女超爽视频免费播放| 不卡中文字幕av| 国产亚洲欧美在线视频| 国产综合在线看| 国产在线999| 国产免费黄色一级片| 国产免费观看高清视频| 精品一区二区三区日本| 国产在线一区二区三区播放| 国精产品99永久一区一区| 国产一区二区在线播放| 国产精品亚洲视频在线观看| 国产精品一香蕉国产线看观看| 国产精品自拍小视频| 国产精品一区二区三区四区五区| 99久久国产免费免费| 久久人人看视频| 精品国产一区二区三区久久久狼 | 国产色综合天天综合网| 国产日韩欧美一区二区| 成人av在线天堂| 久久免费国产视频| 国产精品视频在线观看| 久久久久国产精品免费网站| 天天好比中文综合网| 欧美 日韩 国产在线| 国产精品一区二区三区久久久| 91九色国产ts另类人妖| 日韩亚洲国产中文字幕| 精品毛片久久久久久| 婷婷久久五月天| 黄色a级在线观看| 99久久激情视频| 久久精品国产欧美亚洲人人爽| 精品久久久久久亚洲| 午夜精品一区二区三区在线播放 | 日日橹狠狠爱欧美超碰| 黄色a级片免费| 91国产在线播放| 国产精品美女久久久久av福利 | 日韩一区二区三区资源 | 粉嫩精品一区二区三区在线观看| 91久久精品国产| 国产精品精品久久久| 亚洲91精品在线观看| 国内精品国产三级国产99| 97人人模人人爽视频一区二区| 久久精品久久精品亚洲人| 亚洲一区二区三区免费看| 欧美一级大片视频| 91精品国产高清久久久久久91裸体 | 欧美凹凸一区二区三区视频| 99视频在线免费播放| 国产精品国产三级欧美二区| 日本在线观看一区| 国产免费毛卡片| 国产精品久久久久av| 日韩精品久久久免费观看| 91免费版网站入口| 欧美激情一区二区三区久久久| 欧美一区二区在线视频观看| 国产精品69久久| 欧美日韩第一页| 国产中文字幕二区| 国产精品免费小视频| 日韩高清国产精品| 久久另类ts人妖一区二区| 一区二区免费在线观看| 国产午夜福利在线播放| 国产精品美女网站| 欧美影视一区二区| 久操网在线观看| 日本新janpanese乱熟| 久久久一本二本三本| 亚洲综合五月天| 国产精品一区二区av| 久久99久久久久久久噜噜| 国产日韩精品在线播放| 久久亚洲精品一区二区| 免费在线一区二区| 日韩一区二区三区国产| 日本精品久久久久影院| 国产成人精品免费视频 | 午夜在线视频免费观看| 97人人干人人| 亚洲欧美久久234| 91麻豆国产精品| 午夜欧美大片免费观看| 久久九九视频| 日韩女优在线播放| 日韩中文字幕在线观看| 欧美污视频久久久| 国产精品毛片a∨一区二区三区|国| 日韩视频第二页| 神马国产精品影院av| 欧美诱惑福利视频| 国产精品久久在线观看| 国产一级片91| 亚洲一区中文字幕| 国产成人综合一区二区三区| 欧美专区在线观看| 国产精品成人免费视频| 国产精品中文字幕在线| 亚洲综合一区二区不卡| 久久久999免费视频| 欧美一区二区影视| 精品免费二区三区三区高中清不卡| 国产熟女高潮视频| 自拍另类欧美| 久久精品在线免费视频| 欧美中文在线观看国产| 国产精品成人在线| 91九色偷拍| 欧美一级黑人aaaaaaa做受 | 国产精品美女网站| 国产欧美中文字幕| 天天摸天天碰天天添| 国产精品爽爽爽爽爽爽在线观看| 国产一区二区三区精彩视频| 亚洲精品一区二| 久久精品91久久久久久再现| 精品一区二区日本| 国产99久久精品一区二区| 91久久久久久久一区二区| 欧美一性一乱一交一视频| 欧美黄网免费在线观看| 国产成人精品久久二区二区| 国产中文字幕乱人伦在线观看| 亚洲91精品在线亚洲91精品在线| 久久久久久久久爱| 国产欧美va欧美va香蕉在线| 日本一区二区免费高清视频| 久久资源免费视频| 国产盗摄xxxx视频xxx69| 国产欧美一区二区三区久久人妖 | 不卡一卡2卡3卡4卡精品在| 日本精品一区二区三区在线| 国产精品劲爆视频| 久久久免费观看| 国产在线资源一区| 日韩欧美亚洲精品| 中文字幕久精品免| 国产成人午夜视频网址| 成人免费aaa| 欧美极品少妇无套实战| 亚洲xxxx视频| 久久国产天堂福利天堂| 久久手机在线视频| 国产免费一区二区三区在线观看| 品久久久久久久久久96高清| 午夜精品一区二区三区av| 欧美久久精品一级黑人c片| 深夜精品寂寞黄网站在线观看| 97干在线视频| 国产一区二区高清视频| 欧美精品亚洲精品| 少妇人妻无码专区视频| 伊人婷婷久久| 欧美成人精品一区| 久久精品99久久久香蕉| 国产成人av网| 久久欧美在线电影| 国产伦精品一区二区三区在线| 精品日本一区二区| 秋霞在线一区二区| 日本精品一区二区三区不卡无字幕 | 日本免费在线精品| 一区二区高清视频| 色与欲影视天天看综合网| 国产精品视频公开费视频| 久久久久久中文| 国产超级av在线| 久久另类ts人妖一区二区| 91精品国产综合久久久久久丝袜| 国产日韩精品综合网站| 国产一区视频观看| 国产中文字幕免费观看| 免费黄色福利视频| 国模杨依粉嫩蝴蝶150p| 欧美韩国日本在线| 女同一区二区| 国产在线视频2019最新视频| 蜜桃成人在线| 国产精选一区二区| 97久久伊人激情网| av一区观看| 久久这里只有精品18| 久久免费视频这里只有精品| 久久这里只有精品18| 久久手机视频| 久久99精品久久久久久秒播放器| 久久精品国产理论片免费| www.欧美精品一二三区| 国产精品无码一本二本三本色 | 久久综合给合久久狠狠色 | 97精品国产97久久久久久春色| www黄色av| 国产福利精品视频| 日韩中文字幕不卡视频| 国产精品日韩欧美一区二区三区|