【openeuler/spark docker image overview】

【代码】【openeuler/spark docker image overview】

weixin_43878094

935人浏览 · 2024-06-17 15:30:45

weixin_43878094 · 2024-06-17 15:30:45 发布

openEuler

Quick reference

The official Spark docker image.
Maintained by: openEuler CloudNative SIG.
Where to get help: openEuler CloudNative SIG, openEuler.

Spark | openEuler

Current MLflow docker images are built on the openEuler. This repository is free to use and exempted from per-user rate limits.

Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Learn more on Spark website.

Supported tags and respective Dockerfile links

The tag of each spark docker image is consist of the version of spark and the version of basic image. The details are as follows

Tags	Currently	Architectures
3.3.1-22.03-lts	spark 3.3.1 on openEuler 22.03-LTS	amd64, arm64
3.3.2-22.03-lts	spark 3.3.2 on openEuler 22.03-LTS	amd64, arm64
3.4.0-22.03-lts	spark 3.4.0 on openEuler 22.03-LTS	amd64, arm64

Usage

In this usage, users can select the corresponding {Tag} based on their requirements.

Online Documentation
You can find the latest Spark documentation, including a programming guide, on the project web page. This README file only contains basic setup instructions.
Pull the openeuler/redis image from docker
```
docker pull openeuler/spark:{Tag}
```
Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
```
docker run -it --name spark openeuler/spark:{Tag} /opt/spark/bin/spark-shell
```
Try the following command, which should return 1,000,000,000:
```
scala> spark.range(1000 * 1000 * 1000).count()
```
Interactive Python Shell
The easiest way to start using PySpark is through the Python shell:
```
docker run -it --name spark openeuler/spark:{Tag} /opt/spark/bin/pyspark
```
And run the following command, which should also return 1,000,000,000:
```
>>> spark.range(1000 * 1000 * 1000).count()
```
Running Spark on Kubernetes
https://spark.apache.org/docs/latest/running-on-kubernetes.html⁠.
Configuration and environment variables
See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable.

Question and answering

If you have any questions or want to use some special features, please submit an issue or a pull request on openeuler-docker-images.

鲲鹏昇腾开发者社区是面向全社会开放的“联接全球计算开发者，聚合华为+生态”的社区，内容涵盖鲲鹏、昇腾资源，帮助开发者快速获取所需的知识、经验、软件、工具、算力，支撑开发者易学、好用、成功，成为核心开发者。

更多推荐

华为的准万亿大模型，是如何训练的？

鲲鹏昇腾开发者社区

NW.js与Electron终极对比：2024年选择最适合的桌面应用开发框架

在当今快速发展的桌面应用开发领域，**NW.js**和**Electron**作为两个主流的桌面应用开发框架，都让开发者能够使用HTML、CSS和JavaScript构建跨平台应用。但如何在这两个优秀的框架中做出正确选择？本文将为您提供完整的对比分析，帮助您找到最适合项目需求的桌面应用开发方案。## 🔍 框架概述与核心技术### NW.js：原生融合的桌面应用框架**NW.js**基于

鲲鹏昇腾开发者社区

Redux状态迁移验证：确保数据完整性的终极指南

在复杂的React应用中，**Redux状态迁移验证**是确保应用数据完整性和一致性的关键环节。随着应用功能的迭代升级，状态结构的变化不可避免，但如何安全地进行这些变更而不破坏现有功能，是每个开发者都需要掌握的技能。本文将为您提供完整的Redux状态迁移验证方法和最佳实践。✨## 为什么状态迁移验证如此重要？**数据完整性**是任何应用程序的基石。在Redux应用中，状态迁移不当可能导致：