Jupyter is a web application that allows users to create and share documents containing live code, equations, visualizations and narrative text. This section describes how to configure a Jupyter Python 3 notebook to allow access to Koverse data sets.
The prerequisites for accessing Koverse data sets in a Jupyter Python 3 notebook are:
Spark 1.6 Python 3.5 Jupyter
Once the prerequisites are met, you will need to download a koverse-spark-datasource JAR file. The version you download should match your installed Koverse. You can find the JAR files here:
Next, you will make additions and changes to your environment variables, as follows. Be sure to replace
with the location of your installed Spark 1.6 and
/usr/local/bin/python with the location of your Python 3 binary executable:
export SPARK_HOME=/opt/spark export PATH=$SPARK_HOME/bin:$PATH export PYSPARK_PYTHON=/usr/local/bin/python export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
You are now ready to start the Jupyter notebook using
pyspark which is part of the Spark installation:
pyspark --jars <location of koverse-spark-datasource JAR file downloaded, above>
An example of reading a Koverse data set in a Jupyter Python 3 notebook is shown below.
Note that there is currently a limitation requiring Koverse data sets to be written as the user