昇思25天学习打卡营第10天|K近邻算法

【代码】昇思25天学习打卡营第10天|K近邻算法。

2301_78538042

461人浏览 · 2024-07-15 19:26:20

2301_78538042 · 2024-07-15 19:26:20 发布

ITop K algorithm as a famous algorithm put up long long ago doesnot not need to use too much calculation reusources, here we explore the execution of it.

A prediction algorithm of nearest problem:

1) find the 1-th to k-th nearest samples we have to the given samples(to be tested), and we count their numbers.

2) take the argmax.

we take Eucild distance.

Here is an example.

Wine dataset contains three kinds of wine produced in one place in Italy. They have some characteristics like :1.alcohol, 2. malic acid. 3. ash... 13. Proline.

Q: give you 13 features of some wine, predict which kind it is.

we read some data.

with open("wine.data") as csv_file:
    data = list(csv.reader(csv_file, delimiter=','))
print(data[56:62]+ data[130:133])

looks like:

some little skills to treat data:

process 13 features as X and 3 kinds as Y

X = np.array([[float(x) for x in s[1:]] for s in data[:178]], np.float32)
Y = np.array([s[0] for s in data[:178]], np.int32)

Visualize:

attrs = ['Alcohol', 'Malic acid', 'Ash', 'Alcalinity of ash', 'Magnesium', 'Total phenols','Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins', 'Color intensity', 'Hue','OD280/OD315 of diluted wines', 'Proline']

plt.figure(figsize = (10,8))
for i in range(0,4):
    plt.subplot(2,2, i+1)
    a1,a2 = 2*i, 2*i+1
    plt.scatter(X[:59,a1],X[:59,a2],label='1')
    plt.scatter(X[59:130,a1], X[59:130,a2], label = '2')
    plt.scatter(X[130:,a1], X[130:, a2], label = '3')
    plt.xlabel(attrs[a1])
    plt.ylabel(attrs[a2])
    plt.legend
plt.show()

after seperating them as train and test subsets.

define the model:

class KnnNet(nn.Cell):
    def __init__(self,k):
        super(KnnNet,self).__init__()
        self.k = k
    def construct(self, x, X_train):
        x_title = ops.title(x,(128, 1))
        square_diff = ops.square(x_title - X_train)
        square_dist = ops.sum(square_diff, 1)
        dist = ops.sqrt(square_dist)
        values, indices = ops.topk(-dist, self.k)
        return indices
def knn(knn_net, x, X_train, Y_train):
    x,X_train = ms.Tensor(x), ms.Tensor(X_train)
    indices = knn_net(x,X_train)
    topk_cls = [0]*len(indices.asnumpy())
    for idx in indices.asnumpy():
        topk_cls[Y_train[idx]] +=1
    cls = np.argmax(topk_cls)
    return cls

we defined a model just calculating the euclid dist. we sum the too_k kinds.

we predict.

acc = 0
knn_net = KnnNet(5)
for x, y in zip(X_test, Y_test):
    pred = knn(knn_net, x, X_train, Y_train)
    acc += (pred == y)
    print("label :%d ,prediction:%s" %(y,pred))
print("Val acc: %f" % (acc/len(Y_test)))

Not very good.

鲲鹏昇腾开发者社区是面向全社会开放的“联接全球计算开发者，聚合华为+生态”的社区，内容涵盖鲲鹏、昇腾资源，帮助开发者快速获取所需的知识、经验、软件、工具、算力，支撑开发者易学、好用、成功，成为核心开发者。

更多推荐

从环境搭建到算子调试：CANN 9.0 + ops-cv 全流程实战指南

鲲鹏昇腾开发者社区

HCCL常见问题定位指南

鲲鹏昇腾开发者社区

昇腾 910B NPU 大模型部署实践：vLLM 与 Transformers 方案详解

公司之前用 A100，但懂的都懂——买不到、买不起、不敢买。华为昇腾 910B 是国产替代方案，理论上性能对标 A100。老板拍板：“就它了，先上一台试试。# 我的第一反应docker run --runtime=nvidia ... # 报错：没有这个 runtimeexport CUDA_VISIBLE_DEVICES=0 # 毫无反应NPU 不是"国产 CUDA"，它是完全不同的生态。有自己