国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務合肥法律

CS439編程代寫、代做Java程序語言
CS439編程代寫、代做Java程序語言

時間:2024-10-13  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯



CS439: Introduction to Data Science Fall 2024 
 
Problem Set 1 
 
Due: 11:59pm Friday, October 11, 2024 
 
Late Policy: The homework is due on 10/11 (Friday) at 11:59pm. We will release the solutions 
of the homework on Canvas on 10/16 (Wednesday) 11:59pm. If your homework is submitted to 
Canvas before 10/11 11:59pm, there will no late penalty. If you submit to Canvas after 10/11 
11:59pm and before 10/16 11:59pm (i.e., before we release the solution), your score will be 
penalized by 0.9k
, where k is the number of days of late submission. For example, if you 
submitted on 10/14, and your original score is 80, then your final score will be 80*0.93
=58.** 
for 14-11=3 days of late submission. If you submit to Canvas after 10/16 11:59pm (i.e., after we 
release the solution), then you will earn no score for the homework.  
 
General Instructions 
 
Submission instructions: These questions require thought but do not require long answers. 
Please be as concise as possible. You should submit your answers as a writeup in PDF format, 
for those questions that require coding, write your code for a question in a single source code 
file, and name the file as the question number (e.g., question_1.java or question_1.py), finally, 
put your PDF answer file and all the code files in a folder named as your Name and NetID (i.e., 
Firstname-Lastname-NetID.pdf), compress the folder as a zip file (e.g., Firstname-LastnameNetID.zip),
and submit the zip file via Canvas. 
 
For the answer writeup PDF file, we have provided both a word template and a latex template 
for you, after you finished the writing, save the file as a PDF file, and submit both the original 
file (word or latex) and the PDF file. 
 
Questions 
 
1. Map-Reduce (35 pts) 
 
Write a MapReduce program in Hadoop that implements a simple “People You Might Know” 
social network friendship recommendation algorithm. The key idea is that if two people have a 
lot of mutual friends, then the system should recommend that they connect with each other. 
 
Input: Use the provided input file hw1q1.zip. 
 
The input file contains the adjacency list and has multiple lines in the following format: 
<User><TAB><Friends> 
 Here, <User> is a unique integer ID corresponding to a unique user and <Friends> is a commaseparated
 list of unique IDs corresponding to the friends of the user with the unique ID <User>. 
Note that the friendships are mutual (i.e., edges are undirected): if A is friend with B, then B is 
also friend with A. The data provided is consistent with that rule as there is an explicit entry for 
each side of each edge. 
 
Algorithm: Let us use a simple algorithm such that, for each user U, the algorithm recommends 
N = 10 users who are not already friends with U, but have the largest number of mutual friends 
in common with U. 
 
Output: The output should contain one line per user in the following format: 
 
<User><TAB><Recommendations> 
 
where <User> is a unique ID corresponding to a user and <Recommendations> is a commaseparated
 list of unique IDs corresponding to the algorithm’s recommendation of people that 
<User> might know, ordered by decreasing number of mutual friends. Even if a user has 
fewer than 10 second-degree friends, output all of them in decreasing order of the number of 
mutual friends. If a user has no friends, you can provide an empty list of recommendations. If 
there are multiple users with the same number of mutual friends, ties are broken by ordering 
them in a numerically ascending order of their user IDs. 
 
Also, please provide a description of how you are going to use MapReduce jobs to solve this 
problem. We only need a very high-level description of your strategy to tackle this problem. 
 
Note: It is possible to solve this question with a single MapReduce job. But if your solution 
requires multiple MapReduce jobs, then that is fine too. 
 
What to submit: 
 
(i) The source code as a single source code file named as the question number (e.g., 
question_1.java). 
 
(ii) Include in your writeup a short paragraph describing your algorithm to tackle this problem. 
 
(iii) Include in your writeup the recommendations for the users with following user IDs: 
924, 8941, 8942, **19, **20, **21, **22, 99**, 9992, 9993. 
 
 
2. Association Rules (35 pts) 
 
Association Rules are frequently used for Market Basket Analysis (MBA) by retailers to 
understand the purchase behavior of their customers. This information can be then used for many different purposes such as cross-selling and up-selling of products, sales promotions, 
loyalty programs, store design, discount plans and many others. 
 
Evaluation of item sets: Once you have found the frequent itemsets of a dataset, you need to 
choose a subset of them as your recommendations. Commonly used metrics for measuring 
significance and interest for selecting rules for recommendations are: 
 
2a. Confidence (denoted as conf(A → B)): Confidence is defined as the probability of 
occurrence of B in the basket if the basket already contains A: 
 
conf(A → B) = Pr(B|A), 
 
where Pr(B|A) is the conditional probability of finding item set B given that item set A is 
present. 
 
2b. Lift (denoted as lift(A → B)): Lift measures how much more “A and B occur together” than 
“what would be expected if A and B were statistically independent”: 
* and N is the total number of transactions (baskets). 
 
3. Conviction (denoted as conv(A→B)): it compares the “probability that A appears without B if 
they were independent” with the “actual frequency of the appearance of A without B”: 
 
(a) [5 pts] 
 
A drawback of using confidence is that it ignores Pr(B). Why is this a drawback? Explain why lift 
and conviction do not suffer from this drawback? 
 
(b) [5 pts] 
 
A measure is symmetrical if measure(A → B) = measure(B → A). Which of the measures 
presented here are symmetrical? For each measure, please provide either a proof that the 
measure is symmetrical, or a counterexample that shows the measure is not symmetrical. 
 
(c) [5 pts] 
 A measure is desirable if its value is maximal for rules that hold 100% of the time (such rules are 
called perfect implications). This makes it easy to identify the best rules. Which of the above 
measures have this property? Explain why. 
 
 
Product Recommendations: The action or practice of selling additional products or services to 
existing customers is called cross-selling. Giving product recommendation is one of the 
examples of cross-selling that are frequently used by online retailers. One simple method to 
give product recommendations is to recommend products that are frequently browsed 
together by the customers. 
 
Suppose we want to recommend new products to the customer based on the products they 
have already browsed on the online website. Write a program using the A-priori algorithm to 
find products which are frequently browsed together. Fix the support to s = 100 (i.e. product 
pairs need to occur together at least 100 times to be considered frequent) and find itemsets of 
size 2 and 3. 
 
Use the provided browsing behavior dataset browsing.txt. Each line represents a browsing 
session of a customer. On each line, each string of 8 characters represents the id of an item 
browsed during that session. The items are separated by spaces. 
 
Note: for the following questions (d) and (e), the writeup will require a specific rule ordering 
but the program need not sort the output. 
 
(d) [10pts] 
 
Identify pairs of items (X, Y) such that the support of {X, Y} is at least 100. For all such pairs, 
compute the confidence scores of the corresponding association rules: X ⇒ Y, Y ⇒ X. Sort the 
rules in decreasing order of confidence scores and list the top 5 rules in the writeup. Break ties, 
if any, by lexicographically increasing order on the left hand side of the rule. 
 
(e) [10pts] 
 
Identify item triples (X, Y, Z) such that the support of {X, Y, Z} is at least 100. For all such triples, 
compute the confidence scores of the corresponding association rules: (X, Y) ⇒ Z, (X, Z) ⇒ Y, 
and (Y, Z) ⇒ X. Sort the rules in decreasing order of confidence scores and list the top 5 rules in 
the writeup. Order the left-hand-side pair lexicographically and break ties, if any, by 
lexicographical order of the first then the second item in the pair. 
 
What to submit: 
 
Include your properly named code file (e.g., question_2.java or question_2.py), and include the 
answers to the following questions in your writeup: 
 (i) Explanation for 2(a). 
 
(ii) Proofs and/or counterexamples for 2(b). 
 
(iii) Explanation for 2(c). 
 
(iv) Top 5 rules with confidence scores for 2(d). 
 
(v) Top 5 rules with confidence scores for 2(e). 
 
3. Locality-Sensitive Hashing (30 pts) 
 
When simulating a random permutation of rows, as described in Sec 3.3.5 of MMDS textbook, 
we could save a lot of time if we restricted our attention to a randomly chosen k of the n rows, 
rather than hashing all the row numbers. The downside of doing so is that if none of the k rows 
contains a 1 in a certain column, then the result of the min-hashing is “don’t know,” i.e., we get 
no row number as a min-hash value. It would be a mistake to assume that two columns that 
both min-hash to “don’t know” are likely to be similar. However, if the probability of getting 
“don’t know” as a min-hash value is small, we can tolerate the situation, and simply ignore such 
min-hash values when computing the fraction of min-hashes in which two columns agree. 
 
(a) [10 pts] 
 
Suppose a column has m 1’s and therefore (n-m) 0’s. Prove that the probability we get 
“don’t know” as the min-hash value for this column is at most (
+,-
+ ).. 
 
(b) [10 pts] 
 
Suppose we want the probability of “don’t know” to be at most  ,/0. Assuming n and m are 
both very large (but n is much larger than m or k), give a simple approximation to the smallest 
value of k that will assure this probability is at most  ,/0. Hints: (1) You can use (
+,-
+ ). as the 
exact value of the probability of “don’t know.” (2) Remember that for large x, (1 − /
1
)1 ≈ 1/ . 
 
(c) [10 pts] 
 
Note: This question should be considered separate from the previous two parts, in that we are 
no longer restricting our attention to a randomly chosen subset of the rows. 
 When min-hashing, one might expect that we could estimate the Jaccard similarity without 
using all possible permutations of rows. For example, we could only allow cyclic permutations 
i.e., start at a randomly chosen row r, which becomes the first in the order, followed by rows 
r+1, r+2, and so on, down to the last row, and then continuing with the first row, second row, 
and so on, down to row r−1. There are only n such permutations if there are n rows. However, 
these permutations are not sufficient to estimate the Jaccard similarity correctly. 
 
Give an example of two columns such that the probability (over cyclic permutations only) that 
their min-hash values agree is not the same as their Jaccard similarity. In your answer, please 
provide (a) an example of a matrix with two columns (let the two columns correspond to sets 
denoted by S1 and S2) (b) the Jaccard similarity of S1 and S2, and (c) the probability that a 
random cyclic permutation yields the same min-hash value for both S1 and S2. 
 
What to submit: 
 
Include the following in your writeup: 
 
(i) Proof for 3(a) 
 
(ii) Derivation and final answer for 3(b) 
 
(iii) Example for 3(c) 
 
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp




 

掃一掃在手機打開當前頁
  • 上一篇:FINM8006代寫、代做Python編程設(shè)計
  • 下一篇:&#160;ICT50220代做、代寫c++,Java程序設(shè)計
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科技CAE仿真
    流體仿真外包多少錢_專業(yè)CFD分析代做_友商科
    CAE仿真分析代做公司 CFD流體仿真服務 管路流場仿真外包
    CAE仿真分析代做公司 CFD流體仿真服務 管路
    流體CFD仿真分析_代做咨詢服務_Fluent 仿真技術(shù)服務
    流體CFD仿真分析_代做咨詢服務_Fluent 仿真
    結(jié)構(gòu)仿真分析服務_CAE代做咨詢外包_剛強度疲勞振動
    結(jié)構(gòu)仿真分析服務_CAE代做咨詢外包_剛強度疲
    流體cfd仿真分析服務 7類仿真分析代做服務40個行業(yè)
    流體cfd仿真分析服務 7類仿真分析代做服務4
    超全面的拼多多電商運營技巧,多多開團助手,多多出評軟件徽y1698861
    超全面的拼多多電商運營技巧,多多開團助手
    CAE有限元仿真分析團隊,2026仿真代做咨詢服務平臺
    CAE有限元仿真分析團隊,2026仿真代做咨詢服
    釘釘簽到打卡位置修改神器,2026怎么修改定位在范圍內(nèi)
    釘釘簽到打卡位置修改神器,2026怎么修改定
  • 短信驗證碼 豆包網(wǎng)頁版入口 破天一劍 目錄網(wǎng) 排行網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務 | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看
    激情五月六月婷婷| 欧美中文字幕在线视频| 日韩极品视频在线观看| 丰满人妻中伦妇伦精品app| 国产精品久久亚洲| 欧美一级二级三级| 日韩在线中文字| 色欲色香天天天综合网www| 国内视频一区二区| 国产精品日韩在线一区| 激情网站五月天| 国产精品久久婷婷六月丁香| 欧美亚洲日本网站| 北条麻妃一区二区三区中文字幕| 色欲av无码一区二区人妻| 久久久免费在线观看| 午夜免费久久久久| 国产福利成人在线| 日本伊人精品一区二区三区介绍| 久久这里只有精品18| 亚洲狠狠婷婷综合久久久| 91九色国产在线| 午夜精品一区二区三区在线播放| 成人国产在线看| 亚洲一区二区自拍| 国产精品99久久久久久人| 日韩av免费在线| 久久国产主播精品| 日本不卡免费新一二三区| 久久国产精品久久| 欧美精品一区二区三区在线四季| 国产精品老女人精品视频| 美媛馆国产精品一区二区| 国产精品久久久久久av福利| 国产日韩二区| 一区二区冒白浆视频| 91精品美女在线| 日韩免费在线观看av| 久久人人看视频| 国产一区欧美二区三区| 国内精品久久久久久影视8 | 国产一区视频免费观看| 国产精品久久久久久搜索| 国产一级片91| 亚洲在线免费视频| 国产黄色激情视频| 欧美做暖暖视频| 国产精品高清在线| 成人av免费电影| 日本中文字幕一级片| 久久精品免费电影| 国内一区在线| 欧美精品激情在线| 国产成人a亚洲精品| 蜜桃视频日韩| 丁香色欲久久久久久综合网| 精品国产一区二区三区在线观看| 国产在线精品一区| 午夜精品理论片| 国产精品久久久久久久av大片| 99久久99久久精品国产片| 日韩色妇久久av| 久久这里只有精品视频首页| 91精品美女在线| 欧美成人综合一区| 亚洲综合自拍一区| 色偷偷噜噜噜亚洲男人的天堂| 国产在线精品一区二区三区| 无码aⅴ精品一区二区三区浪潮| 国产精品视频网站| 99国产在线| 国语自产精品视频在线看| 亚洲综合色av| 国产精品免费电影| 国产黄色激情视频| 国产乱码精品一区二区三区卡| 日韩精品久久久| 亚洲综合第一页| 国产精品久久久久av福利动漫| 久久免费视频这里只有精品| 国产一区二区三区高清视频| 日韩在线观看a| 精品久久久久av| 久久精品国产欧美激情| 91免费看国产| 国产一区二区在线观看免费播放| 日韩一区二区三区高清| 欧美另类69精品久久久久9999| 国产v亚洲v天堂无码| 粉嫩高清一区二区三区精品视频| 欧美精品与人动性物交免费看 | 少妇人妻在线视频| 欧美人交a欧美精品| 色偷偷88888欧美精品久久久| 国产欧美久久久久久| 男人的天堂99| 人偷久久久久久久偷女厕 | 国产亚洲精品美女久久久m| 青青在线视频观看| 色狠狠久久av五月综合|| 亚洲视频精品一区| 中文字幕制服丝袜在线| 国产精品吹潮在线观看| 精品激情国产视频| 日韩在线精品视频| 久久99精品久久久水蜜桃| 91av网站在线播放| 国产精品一区=区| 国产在线观看不卡| 激情综合在线观看| 欧美亚洲丝袜| 人妻熟女一二三区夜夜爱| 欧美一级片一区| 亚洲精品国产一区| 在线不卡视频一区二区| 精品国产一区二区三区免费| 国产精品久久波多野结衣| 久久久精品美女| 精品国产一区二区三区久久久狼| 国产成+人+综合+亚洲欧洲| 成人精品一区二区三区 | 国内精品久久久久久中文字幕| 欧美视频1区| 欧美在线视频导航| 欧美中日韩在线| 欧美影院在线播放| 欧美中文字幕在线视频| 久久久最新网址| 久久视频精品在线| 久久精品网站视频| 久热99视频在线观看| 久久久久久久久久久久久9999| 久久久久久久一区二区| 久久艹中文字幕| 久久久久久久久91| 国产激情美女久久久久久吹潮| 国产v亚洲v天堂无码| 日韩日本欧美亚洲| 久久精品电影一区二区| 国产精品久久久久久久一区探花| 国产精品激情av电影在线观看| 精品国产日本| 一区二区三区欧美在线| 亚洲综合日韩在线| 日日碰狠狠躁久久躁婷婷| 欧美做受777cos| 国产在线精品一区| av在线播放亚洲| 国产精品com| 国产精品日韩三级| 久久中文字幕一区| 在线精品日韩| 日日碰狠狠躁久久躁婷婷| 奇米一区二区三区四区久久| 国内成人精品一区| 成人av在线网址| 国产福利精品视频| 国产精品免费观看高清| 亚洲熟妇av日韩熟妇在线| 日韩视频免费在线播放| 国产一区欧美二区三区| 97国产一区二区精品久久呦| 日韩在线视频观看| 欧美久久久精品| 手机成人av在线| 欧美精品久久久| 国产人妻777人伦精品hd| 国产精品12345| 国产精品视频自在线| 一区二区三区四区欧美日韩| 色999五月色| 麻豆蜜桃91| 91国产中文字幕| 国产精品黄视频| 无码人妻精品一区二区蜜桃网站| 欧美v在线观看| 91美女福利视频高清| 国产精品偷伦免费视频观看的| 一区二区三区电影| 欧洲精品一区二区三区久久| 国产欧美日韩免费| 久久久久久久久91| 在线亚洲美日韩| 欧美精品自拍视频| 久久久亚洲网站| 久久成人这里只有精品| 日本免费成人网| 国产精品一区二区在线观看| 色777狠狠综合秋免鲁丝| 亚洲国产精品一区二区第一页| 精品一区二区三区毛片| 国产高清不卡无码视频| 欧美人与性动交a欧美精品| 热久久免费国产视频| 国产精品亚洲激情| 日韩中文在线视频| 亚洲精品一区二区毛豆| 国产亚洲精品自在久久| 久久九九国产精品怡红院| 视频一区亚洲|