Python

This task allows you to run scripts in Python language, the version used can be chosen according to the versions made available by your Gaio administrator.

This task allows you to run scripts in Python language, the version used can be chosen according to the versions made available by your Gaio administrator. Libraries can be installed and managed by Gaio developers. In addition, we provide a class called bucket that allows you to extract and export data that is in the clickhouse database that your application has permission to use.

Memory Limit

The Python task in Gaio is limited by default to a maximum of 80% of the machine's memory, if it exceeds this limit it will return a memory limit error.

We will simply navigate through the task interface, and after that we will develop a simple script to serve as an example.

To use Python in Gaio, simply access the Tasks menu and choose Python.

The first page is the main one for the task. In it, on the left, we have the space in a blue theme to write the script, while on the right, in a dark theme, the console is located, where we can view the script's output. To run your script, simply click the "run" button and the result will be displayed in the console.

The second page is your default directory to save the files generated in the script, such as jpeg, png, mp4, pkl files, among others. The name of this folder is assets.

There are three other folders that you can use through the python task, which are the content, inputs and output folders of your application. The path is stored in the following variables: app_inputs, app_outputs and app_assets.

Below is an example of how to create your path to the outputs folder so you can download the generated image.

path = app_outputs + "/imagem_name.png"

In the text box, you must write on each line the correct name of the library you want to install (just the name, without any other characters, as shown in the image below). After choosing the Python version and libraries, simply click the "Install" button for your configurations to be executed.

As previously mentioned, we have a class called bucket, which connects to the clickhouse in an encapsulated way and has the query_df, command, insert_df and create_df methods.

Examples

Function that transforms a clickhouse select into a pandas dataframe in python.

df = bucket.query_df('select columnA, columnB from table where columnB = 'active')

Function that makes a copy of a clickhouse table indicated to a pandas dataframe.

df = bucket . select_df ( 'new_table' )

In the first line we have the function that creates a table in clickhouse that is similar to your pandas dataframe, in the second line we insert the data from your pandas dataframe into the clickhouse table.

bucket . create_df ( 'new_table' , df )
bucket . insert_df ( 'new_table' , df )

Note that to perform the insert_df function we need your pandas dataframe to be similar to your clickhouse table.

Practical example

In this practical example we will go through the part of bringing the data into Python, performing grouping, saving an image in png format, saving the model file, and creating and saving the final table in clickhouse.

First, let's import the libraries that will be used

import pandas as pd
from sklearn . cluster import KMeans
import matplotlib . pyplot as plt
import joblib

For this example we will use the famous iris table provided by several libraries such as scikit-learn. This table is in the clickhouse database within Gaio.

select_df function to bring it to Python, and then apply the kmeans algorithm provided by the scikit-learn library.

# Bring data into python
data = bucket . select_df ( 'iris_table' )

# Apply the K-Means algorithm with 3 clusters (number chosen arbitrarily)
kmeans = KMeans ( n_clusters =3 )
data [ 'cluster' ] = kmeans . fit_predict ( data )

# Evaluate the result - for example, viewing the means of each cluster
cluster_means = data . groupby ( 'cluster' ). mean ()

​In this next step, we will visualize the groups found by the model and save the figure in the assets folder .

# Plot the clusters on a graph (considering only the first two columns)
plt . scatter ( data [ 'sepal_length_cm_' ], data [ 'sepal_width_cm_' ], c = data [ 'cluster' ], cmap = 'viridis' )
plt . xlabel ( 'sepal_length_cm_' )
plt . ylabel ( 'sepal_width_cm_' )

# Save the chart in png format
plt . savefig ( 'assets/cluster_iris.png' )

Now let's save this model so it can be reused at other times, for this we will use the joblib library.

# Save the model
joblib . dump ( kmeans , 'assets/modelo_kmeans_iris.joblib' )

Now we can send the dataframe with the new column generated by the model to the clickhouse so that it can be used by other Gaio tasks. For this we will use create_df and insert_df .

# Create a table in clickhouse similar to your dataframe
bucket . create_df ( 'tmp_iris_clusterizada' , data )

# Insert data from your dataframe into a clickhouse table
bucket . insert_df ( 'tmp_iris_clusterizada' , data )

Last updated