Python clickhouse driver

Python clickhouse driver

clickhouse-connect 0.2.4

pip install clickhouse-connect Copy PIP instructions

Released: Aug 19, 2022

ClickHouse core driver, SqlAlchemy, and Superset libraries

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: Apache Software License (Apache License 2.0)

Requires: Python

Maintainers

Classifiers

Project description

ClickHouse Connect

A suite of Python packages for connecting Python to ClickHouse, initially supporting Apache Superset using a minimal read only SQLAlchemy dialect. Uses the ClickHouse HTTP interface.

Installation

ClickHouse Connect requires Python 3.7 or higher. The cython package must be installed prior to installing clickhouse_connect to build and install the optional Cython/C extensions used for improving read and write performance using the ClickHouse Native format. After installing cython if desired, clone this repository and run python setup.py install from the project directory.

Getting Started

Simple ‘command’ that does not return a result set.

Bulk insert of a matrix of rows and columns.

Minimal SQLAlchemy Support

On installation ClickHouse Connect registers the clickhousedb SQLAlchemy Dialect entry point. This dialect supports basic table reflection for table columns and datatypes, and command and query execution using DB API 2.0 cursors. Most ClickHouse datatypes have full query/cursor support.

ClickHouse Connect does not yet implement the full SQLAlchemy API for DDL (Data Definition Language) or ORM (Object Relational Mapping). These features are in development.

Superset Support

On installation ClickHouse Connect registers the clickhousedb Superset Database Engine Spec entry point. Using the clickhousedb SQLAlchemy dialect, the engine spec supports complete data exploration and Superset SQL Lab functionality with all standard ClickHouse data types. See Connecting Superset for complete instructions.

ClickHouse Enum, UUID, and IP Address datatypes are treated as strings. For compatibility with Superset Pandas dataframes, unsigned UInt64 data types are interpreted as signed Int64 values. ClickHouse CSV Upload via SuperSet is not yet implemented.

Optional Features

SQLAlchemy and Superset require the corresponding SQLAlchemy and Apache Superset packages to be included in your Python installation. ClickHouse connect also includes C/Cython extensions for improved performance reading String and FixedString datatypes. These extensions will be installed automatically by setup.py if a C compiler is available.

Query results can be returned as either a numpy array or a pandas DataFrame if the numpy and pandas libraries are available. Use the client methods query_np and query_df respectively.

Tests

Main Client Interface

Interaction with the ClickHouse server is done through a clickhouse_connect Client instance. At this point only an HTTP(s) based Client is supported.

HTTP Client constructor/initialization parameters

Create a ClickHouse client using the clickhouse_connect.driver.create_client(. ) function or clickhouse_connect.get_client(. ) wrapper. All parameters are optional:

Any remaining keyword parameters are interpreted as ‘setting’ parameters to send to the ClickHouse server with every query/request

Querying data

Use the client query method to retrieve a QueryResult from ClickHouse. Parameters:

The query method results a QueryResult object with the following fields:

Numpy and Pandas queries

Datatype options for queries

There are some convenience methods in the clickhouse_connect.driver package that control the format of some ClickHouse datatypes. These are included in part to improve Superset compatibility.

Inserting data

Use the client insert method to insert data into a ClickHouse table. Parameters:

Notes on data inserts

The client insert_df can be used to insert a Pandas DataFrame, assuming the column names in the DataFrame match the ClickHouse table column names. Note that a Numpy array can be passed directly as the data parameter to the primary insert method so there is no separate insert_np method.

For column types that can be different native Python types (for example, UUIDs or IP Addresses), the driver will assume that the data type for the whole column matches the first non «None» value in the column and process insert data accordingly. So if the first data value for insert into a ClickHouse UUID column is a string, the driver will assume all data values in that insert column are strings.

DDL and other «simple» SQL statements

The client command method can be used for ClickHouse commands/queries that return a single result or row of results values. In this case the result is returned as a single row TabSeparated values and are cast to a single string, int, or list of string values. The command method parameters are:

Python clickhouse driver

External data for query processing

You can pass external data alongside with query:

There are a lot of ClickHouse server settings. Settings can be specified during Client initialization:

Client with compression support can be constructed as follows:

CityHash algorithm notes

Unfortunately ClickHouse server comes with built-in old version of CityHash algorithm (1.0.2). That’s why we can’t use original CityHash package. An older version is published separately at PyPI.

Specifying query id

You can manually set query identificator for each query. UUID for example:

You can cancel query with specific id by sending another query with the same query id if option replace_running_query is set to 1.

Query results are fetched by the same instance of Client that emitted query.

Retrieving results in columnar form

Columnar form sometimes can be more useful.

Data types checking on INSERT

Data types check is disabled for performance on INSERT queries. You can turn it on by types_check option:

Query execution statistics

Client stores statistics about last query execution. It can be obtained by accessing last_query attribute. Statistics is sent from ClickHouse server and calculated on client side. last_query contains info about:

profile: rows before limit

Receiving server logs

Query logs can be received from server by using send_logs_level setting:

New in version 0.1.3.

Additional connection points can be defined by using alt_hosts. If main connection point is unavailable driver will use next one from alt_hosts.

This option is good for ClickHouse cluster with multiple replicas.

In example above on every new connection driver will use following sequence of hosts if previous host is unavailable:

All queries within established connection will be sent to the same host.

Python DB API 2.0

New in version 0.1.3.

This driver is also implements DB API 2.0 specification. It can be useful for various integrations.

Threads may share the module and connections.

:ref:`dbapi-connection` class is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.

There are some non-standard ClickHouse-related :ref:`Cursor methods ` for: external data, settings, etc.

For automatic disposal Connection and Cursor instances can be used as context managers:

You can use cursor_factory argument to get results as dicts or named tuples (since version 0.2.4):

New in version 0.1.6.

Direct loading into NumPy arrays increases performance and lowers memory requirements on large amounts of rows.

Direct loading into pandas DataFrame is also supported by using query_dataframe:

Writing pandas DataFrame is also supported with insert_dataframe:

Starting from version 0.2.2 nullable columns are also supported. Keep in mind that nullable columns have object dtype. For convenience np.nan and None is supported as NULL values for inserting. But only None is returned after selecting for NULL values.

It’s important to specify dtype during dataframe creation:

New in version 0.2.2.

Each Client instance can be used as a context manager:

Upon exit, any established connection to the ClickHouse server will be closed automatically.

clickhouse-driver 0.2.4

pip install clickhouse-driver Copy PIP instructions

Released: Jun 13, 2022

Python driver with native interface for ClickHouse

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags ClickHouse, db, database, cloud, analytics

Requires: Python >=3.4, xzkostyan

Classifiers

Project description

ClickHouse Python Driver

ClickHouse Python Driver with native (TCP) interface support.

Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch

Features

Documentation

Usage

There are two ways to communicate with server:

Pure Client example:

License

ClickHouse Python Driver is distributed under the MIT license.

Project details

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags ClickHouse, db, database, cloud, analytics

Requires: Python >=3.4, xzkostyan

ClickHouse and Python: Getting to Know the Clickhouse-driver Client

Python clickhouse driver. 0*h. Python clickhouse driver фото. Python clickhouse driver-0*h. картинка Python clickhouse driver. картинка 0*h. pip install clickhouse-connect Copy PIP instructions

Python is a force in the world of analytics due to powerful libraries like numpy along with a host of machine learning frameworks. ClickHouse is an increasingly popular store of data. As a Python data scientist, you may wonder how to connect them.

Fortunately, the Altinity Blog is here to solve mysteries, at least those that involve ClickHouse. This post contains a review of the clickhouse-driver client. It’s a solidly engineered module that is easy to use and integrates easily with standard tools like Jupyter Notebooks and Anaconda. Clickhouse-driver is a great way to jump into ClickHouse Python connectivity.

So Many Python Choices

The first hurdle for Python users is just picking a suitable driver. Even a quick search on pypi.org shows 22 projects with ClickHouse references. They include SQLAlchemy drivers (3 choices), async clients (also 3), and a Pandas-to-ClickHouse interface among others.

Clickhouse-driver offers a straightforward interface that enables Python clients to connect to ClickHouse, issue SELECT and DDL commands, and process results. It’s a good choice for direct Python connectivity with 16 published releases on pypi.org. The latest version is 0.0.17, published on January 10, 2019. If you want to connect to the data warehouse, issue SQL commands, and fetch back data, clickhouse-driver is a great place to start.

Code and Community

The clickhouse-driver source code is published on Github under an MIT license. The main committer is Konstantin Lebedev (@xzkostyan) though there have been a few contributions from others.

Konstantin is very responsive to questions about the driver, which you can register as issues. Much of my understanding of the wire protocol started from Konstantin’s comprehensive responses to an issue related to CSV loading that I filed early on in my use of the code. He has helped a number of other users as well.

Installation

You can of course install clickhouse-driver straight from Github but since releases are posted on pypi.org it’s far easier to use pip, like the example below. Just a note: examples are based on Python 3.7. This installation command includes lz4 compression, which can reduce data transfer sizes enormously.

For testing purposes it’s a best practice to use a virtual environment, which means the installation usually looks like the following example:

If you use Anaconda there is conveniently a clickhouse package in Anaconda Cloud. You can install it with the following command:

After doing this you can use clickhouse-driver in Jupyter Notebooks served up by Anaconda. We will dig more deeply into Anaconda integration in a future blog article. Meanwhile, this should get you started.

Documentation

One of the strengths of clickhouse-driver is excellent documentation. The docs provide a nice introduction to the code as well as detailed descriptions of the API. In fact, it was somewhat challenging to make useful code-level observations for this article because the documentation already covered API behavior so well.

The docs should probably be the first stop for new clickhouse-driver users but are easy to overlook initially since they are referenced at the bottom of the project README.md. I only noticed them after writing a couple of test programs. It would be nice if docs were published in future using Github pages, which puts a prominent link on the top of the Github project. Once you find them though you’ll refer to them regularly.

Basic Operation

Clickhouse-driver is very simple to use. The main interface is the Client class, which most programs import directly.

To set up a connection you instantiate the class with appropriate arguments. Here’s the simplest example for a connection to a localhost server using the default ClickHouse user and unencrypted communications. This is sufficient for trivial tests.

Of course, real applications are more demanding. It’s typical to see something akin to the sample code below. It has a non-default user on a secure connection with self-signed certificates. The database is also different from the usual ‘default’. To top it off we are compressing data.

The option flexibility is great. In particular security options are robust and include basic features corporate InfoSec teams expect. With the foregoing options clickhouse-driver auto-negotiates to TLSv1.2 on a properly configured ClickHouse server. That meets current PCI standards among others. I was also very pleased to find easy support for self-signed certificates, which are common in test scenarios.

Creating a client sets up the connection information but does not actually touch the ClickHouse server. The connection is established when you invoke the Client.execute() method. Here’s an example of a simple SELECT, followed by some code to iterate through the query result so we can see how it is put together.

The output is shown below. It’s a list of tuples containing column values.

The result format has a couple of advantages. First, it’s easy to manipulate in Python. For example, you can just print any part of the output and it will show values, which is handy for debugging. Second, you can use values immediately rather than having to figure out conversions yourselves. That’s handy because Python does not automatically do even relatively simple coercions like str to int in numerical equations.

Let’s quickly tour operations to create a table, load some data, and fetch it back.

Data definition language (DDL) like CREATE TABLE uses a single string argument. The following example splits the string across lines for readability.

INSERT statements take an extra params argument to hold the values, as shown by the following example.

The format for values is the same as the result format for SELECT statements. Clickhouse-driver uses a similar format in both directions. The INSERT params also support dictionary organization as well as generators, as we’ll see in a later section. See the docs for more insert examples.

We already showed an example of a SELECT statement using functions to generate output. Selecting out of a table looks pretty much the same, as shown by the following example.

Clickhouse-driver has a lot of useful features related to SELECTs. For instance, you can enable progress tracking using the Client.execute_with_progress() method, which is great when pulling down large result sets. Similarly the Client.execute_iter() method allows you to chunk results from large datasets to avoid overflowing memory. There’s even cancellation which covers you when somebody accidentally selects a few billion rows. Again, see the docs for examples.

One place where you need to be a little wary is prevention of SQL injection attacks. The procedure for query parameterization uses Python dictionary substitutions, as in the following example.

You might try to circumvent the substitution scheme by setting ‘species’ to a string like “‘Iris-setosa’ AND evil_function() = 0”. The clickhouse-driver cleverly foils this attack by escaping strings and other common data types before doing substitutions. The query ends up looking like the following, which may break but won’t call evil_function() unexpectedly.

This approach will protect you from run-of-the-mill villany with strings but there are ways around it. For instance, it appears possible to pass in Python object types that will not be escaped properly. (Check the driver code here to see why this might be so.) You should review substitution format strings carefully and also check Python parameter types at runtime to ensure something bad does not weasel through. That’s especially the case for Internet-facing applications.

A Deeper Look at the ClickHouse Wire Protocol

This is a good time to discuss what’s actually happening on the wire when communicating between the Python client and ClickHouse. To set context, ClickHouse has two wire protocols: HTTP protocol which uses simple PUT and POST operations to issue queries, and a native TCP/IP protocol that ships data as typed values. These run on different ports so there’s no confusion.

Clickhouse-driver uses the native TCP/IP protocol. This choice is better for Pythonistas because the native protocol knows about types and avoids loss of precision due to binary-to-string conversions. The implementation is correct, at least for the samples that I tried. That is an impressive accomplishment, because the documentation for the native protocol is the C++ implementation code.

As you go deeper into Python access to ClickHouse it’s helpful to understand what the TCP/IP protocol is actually doing. When you run a query, ClickHouse returns results in a binary block format that contains column results in a typed binary format. Here’s an example:

Unlike many databases, ClickHouse results are column-oriented (like the storage). This means that compression works well on query results just as it does on stored values. Compression is invisible to users but can vastly reduce network traffic.

Where ClickHouse differs from many other DBMS implementations is on upload. Let’s look at the INSERT statement again from the previous section.

This format may be a little confusing if you are used to executing INSERT statements as a single string, which is typical for many DBMS types. What you are seeing is a side-effect of the native TCP/IP wire protocol, which ships typed values in both directions. The data values use a column-oriented format, just like the query output.

The TCP/IP protocol has another curious effect, which is that sending INSERTs as a single string won’t even work in clickhouse-driver. It just hangs and will eventually time out.

What’s going on? The server has the first part of the INSERT and is now waiting for data from the client to complete the INSERT in the native protocol. Meanwhile, the client is waiting for the server to respond. This behavior is clearly documented in the clickhouse-driver documentation so one could argue it’s not a bug: you are doing something the protocol does not expect. I don’t completely agree with that view, mostly because it’s confusing to newcomers. This seems like a nice pull request for somebody to work on in future.

But wait, you might ask. The C++ clickhouse-client binary will process an INSERT like the one shown above. How can that possibly work? Well, the trick is that clickhouse-client runs the same code as the ClickHouse server and can parse the query on the client side. It extracts and sends the INSERT statement up to the VALUES clause, waits for the server to send back data types, then converts and sends the data as column-oriented blocks.

Overall the wire protocol is quite reasonable once you understand what is going on. Problems like hanging INSERTs easy to avoid. If you have further questions I suggest firing up WireShark and watching the packets on an unencrypted, uncompressed connection. It’s relatively easy to figure out what’s happening.

Loading CSV

Armed with a better understanding of what the clickhouse-driver is doing under the covers we can tackle a final topic: how to load CSV.

As we now know you can’t just pipe raw CSV into the the driver the way that the clickhouse-client program does it. Fortunately, there’s an easy solution. You can parse CSV into a list of tuples as shown in the following example.

This code works for the Iris dataset values used in this sample, which are relatively simple and automatically parse into types that load properly. For more diverse tables you may need to add additional logic to coerce types. Here’s another approach that works by assigning values in each line to a dictionary. It’s more complex but ensures types are correctly assigned. You can also rearrange the order of columns in the input and do other manipulations to clean up data.

As files run into the 100s of megabytes or more you may want to consider alternatives to Python to get better throughput. Parsing and converting data in Python is relatively slow compared to the C++ clickhouse-client. I would recommend load testing any Python solution for large scale data ingest to ensure you don’t hit bottlenecks.

Summary and Acknowledgments

The clickhouse-driver is relatively young but it is very capable. I am impressed by the thoughtful design, quality of the implementation, and excellent documentation. It looks like a solid base for future Python work with ClickHouse. We’ll review more Python client solutions in the future but for new users, clickhouse-driver is a great place to start.

Thanks to Konstantin Lebedev for reviewing a draft of this article!

Originally published on the Altinity blog on February 1, 2019.

Как записать данные в Clickhouse с помощью Python

Python clickhouse driver. Clickhouse python docker. Python clickhouse driver фото. Python clickhouse driver-Clickhouse python docker. картинка Python clickhouse driver. картинка Clickhouse python docker. pip install clickhouse-connect Copy PIP instructions

Как развернуть быстро Clickhouse с помощью Docker

Как собрать свой образ Clickhouse (docker-compose из оффициальной репы)

В репозитории Github Clickhouse лежит docker-compose.yml файл со следующим содержимым:

Данный файл запускает сборку образов и после сборки запускает контейнеры. Запустится три контейнера:

Python clickhouse driver. clickhouse official repo docker compose with building image. Python clickhouse driver фото. Python clickhouse driver-clickhouse official repo docker compose with building image. картинка Python clickhouse driver. картинка clickhouse official repo docker compose with building image. pip install clickhouse-connect Copy PIP instructions

Сразу скажу, что не очень подходящий способ для использования Clickhouse, т.к. сборка образов требует много времени и лучше использовать уже сформированный официальный образ от Clickhouse.

Но если вы хотите пойти по пути сборки своих образов, то Вам понадобится запустить следующие команды на сервере:

Установка Clickhouse из официального образа с hub.docker.com

Официальная инструкция как задеплоить образ находится здесь https://hub.docker.com/r/clickhouse/clickhouse-server/.

Скачается официальный докер образ (но пока еще не запустится):

Python clickhouse driver. clickhouse docker image from hub docker com official. Python clickhouse driver фото. Python clickhouse driver-clickhouse docker image from hub docker com official. картинка Python clickhouse driver. картинка clickhouse docker image from hub docker com official. pip install clickhouse-connect Copy PIP instructions

Далее необходимо запустить из образа контейнер:

Чтобы проверить, что у Вас запустился контейнер с Clickhouse — запустите команду:

Python clickhouse driver. clickhouse container check. Python clickhouse driver фото. Python clickhouse driver-clickhouse container check. картинка Python clickhouse driver. картинка clickhouse container check. pip install clickhouse-connect Copy PIP instructions

Проверить работу Clickhouse-server можно перейдя по url http : //localhost:8123/ :

Python clickhouse driver. localhost 8123 check clickhouse. Python clickhouse driver фото. Python clickhouse driver-localhost 8123 check clickhouse. картинка Python clickhouse driver. картинка localhost 8123 check clickhouse. pip install clickhouse-connect Copy PIP instructions

Далее запускаем команду:

С помощью этой команды мы подключимся к Clickhouse через native client:

Python clickhouse driver. clickhouse native client select 1. Python clickhouse driver фото. Python clickhouse driver-clickhouse native client select 1. картинка Python clickhouse driver. картинка clickhouse native client select 1. pip install clickhouse-connect Copy PIP instructions

Клиент командной строки

Установка на Ubuntu:

Клиенты и серверы различных версий совместимы, однако если клиент старее сервера, то некоторые новые функции могут быть недоступны. Рекомендуется использовать одинаковые версии клиента и сервера.

Видео «Установка базы данных ClickHouse в виде контейнера Docker»

Установка Clickhouse с помощью docker-compose

Далее создаем папку db, куда clickhouse будет сохранять файлы:

Далее создаем файл docker-compose.yml

Далее запускаем установку с помощью docker-compose:

Зайти внутрь клиента кликхауса можно с помощью команды:

Подключаемся к Clickhouse с помощью DBeaver

Установить dbeaver в Ubuntu можно через Ubuntu Software:

Python clickhouse driver. dbeaver ce. Python clickhouse driver фото. Python clickhouse driver-dbeaver ce. картинка Python clickhouse driver. картинка dbeaver ce. pip install clickhouse-connect Copy PIP instructions

Выбираем коннектор к Clickhouse:

Python clickhouse driver. dbeaver clickhouse connector. Python clickhouse driver фото. Python clickhouse driver-dbeaver clickhouse connector. картинка Python clickhouse driver. картинка dbeaver clickhouse connector. pip install clickhouse-connect Copy PIP instructions

Настройки подключения с дефолтным юзером:

Python clickhouse driver. dbeaver success connection clickhouse settings. Python clickhouse driver фото. Python clickhouse driver-dbeaver success connection clickhouse settings. картинка Python clickhouse driver. картинка dbeaver success connection clickhouse settings. pip install clickhouse-connect Copy PIP instructions

show databases — проверочный запрос к clickhouse:

Python clickhouse driver. dbeaver show databases. Python clickhouse driver фото. Python clickhouse driver-dbeaver show databases. картинка Python clickhouse driver. картинка dbeaver show databases. pip install clickhouse-connect Copy PIP instructions

Интерфейсы для доступа к Clickhouse

ClickHouse имеет богатый набор функций для управления сетевыми подключениями для клиентов, а также для других серверов в кластере. Тем не менее, новым пользователям может быть сложно проработать возможные варианты, а опытным пользователям может быть сложно обеспечить полный доступ к развернутым системам для приложений и их надлежащую защиту.

ClickHouse предоставляет три сетевых интерфейса (они могут быть обернуты в TLS для дополнительной безопасности):

В большинстве случаев рекомендуется использовать подходящий инструмент или библиотеку, а не напрямую взаимодействовать с ClickHouse. Официально поддерживаемые Яндексом:

Существует также широкий спектр сторонних библиотек для работы с ClickHouse:

Что такое http-интерфейс

HTTP интерфейс позволяет использовать ClickHouse на любой платформе, из любого языка программирования. HTTP интерфейс более ограничен по сравнению с родным интерфейсом, но является более совместимым. По умолчанию clickhouse-server слушает HTTP на порту 8123. Запрос отправляется в виде URL параметра с именем query. Или как тело запроса при использовании метода POST. Или начало запроса в URL параметре query, а продолжение POST-ом. Размер URL ограничен 16KB, это следует учитывать при отправке больших запросов.

Порт 8123 является конечной точкой интерфейса HTTP по умолчанию. Вы будете использовать этот порт, если используете команды curl для отправки запросов серверу. Кроме того, ряд библиотек, таких как JDBC-драйвер Yandex ClickHouse, скрытно используют HTTP-запросы, так что вы можете использовать http-интерфейс, даже не подозревая об этом.

Что такое Native TCP (Родной интерфейс)

Нативный протокол используется в клиенте командной строки, для взаимодействия между серверами во время обработки распределенных запросов, а также в других программах на C++. К сожалению, у родного протокола ClickHouse пока нет формальной спецификации.

Порт 9000 является конечной точкой Native TCP интерфейса (по-умолчанию). Он широко используется клиентами, как показано на следующих примерах.

Что такое gRPC

ClickHouse поддерживает интерфейс gRPC. Это система удаленного вызова процедур с открытым исходным кодом, которая использует HTTP/2 и Protocol Buffers.

gRPC — мощный фреймворк для работы с удаленными вызовами процедур. RPC позволяют писать код так, как если бы он был запущен на локальном компьютере, даже если он может выполняться на другом компьютере.

Как правило, gRPC считается лучшей альтернативой протоколу REST для микросервисной архитектуры. Букву g в gRPC можно отнести к компании Google, которая изначально разработала эту технологию. gRPC создан для преодоления ограничений REST в связи с микросервисами.

gRPC — это новейшая структура, созданная на основе протокола RPC. Он использует свои преимущества и пытается исправить проблемы традиционного RPC. gRPC использует буферы протокола в качестве языка определения интерфейса для сериализации и связи вместо JSON/XML.

Буферы протокола могут описывать структуру данных, и на основе этого описания может быть сгенерирован код для генерации или анализа потока байтов, представляющего структурированные данные. По этой причине gRPC предпочтительнее для многоязычных веб-приложений (реализованных с использованием различных технологий). Формат двоичных данных позволяет облегчить общение. gRPC также можно использовать с другими форматами данных, но предпочтительным является буфер протокола.

Кроме того, gRPC построен на основе HTTP/2, который поддерживает двунаправленную связь наряду с традиционным запросом/ответом. gRPC допускает слабую связь между сервером и клиентом. На практике клиент открывает долговременное соединение с сервером gRPC, и новый поток HTTP/2 открывается для каждого вызова RPC.

В отличие от REST, который использует JSON (в основном), gRPC использует буферы протокола, которые являются лучшим способом кодирования данных. Поскольку JSON — это текстовый формат, он будет намного тяжелее, чем сжатые данные в формате protobuf.

Network Listener Configuration

ClickHouse позволяет легко включать и отключать порты слушателей, а также назначать им новые номера. Для каждого типа порта существуют простые теги config.xml, как показано в следующей таблице. Столбец обычных значений показывает номер порта, который большинство клиентов предполагает для определенного соединения. Если вы измените значение, вам может потребоваться соответствующее изменение клиентов.

ТегОписаниеУсловное значение
http_portПорт для незашифрованных HTTP-запросов8123
https_portПорт для зашифрованных запросов HTTPS8443
interserver_http_portПорт для незашифрованного трафика HTTP-репликации9009
interserver_https_portПорт для зашифрованного трафика репликации HTTPS
tcp_portПорт для незашифрованных собственных запросов TCP/IP9000
tcp_port_secureПорт для зашифрованных TLS собственных запросов TCP/IP9440

Как создать database в Clickhouse, таблицу и вставить тестовые данные

Идем в dbeaver и запускаем скрипты.

1. Создаем базу данных в Clickhouse

ClickHouse and Python: Getting to Know the Clickhouse-driver Client

Python clickhouse driver. 7e5f5 clickhouseandpythongettingtoknowtheclickhouse driverclient. Python clickhouse driver фото. Python clickhouse driver-7e5f5 clickhouseandpythongettingtoknowtheclickhouse driverclient. картинка Python clickhouse driver. картинка 7e5f5 clickhouseandpythongettingtoknowtheclickhouse driverclient. pip install clickhouse-connect Copy PIP instructions

Python is a force in the world of analytics due to powerful libraries like numpy along with a host of machine learning frameworks. ClickHouse is an increasingly popular store of data. As a Python data scientist you may wonder how to connect them.

Fortunately the Altinity Blog is here to solve mysteries, at least those that involve ClickHouse. This post contains a review of the clickhouse-driver client. It’s a solidly engineered module that is easy to use and integrates easily with standard tools like Jupyter Notebooks and Anaconda. Clickhouse-driver is a great way to jump into ClickHouse Python connectivity.

So Many Python Choices

The first hurdle for Python users is just picking a suitable driver. Even a quick search on pypi.org shows 22 projects with ClickHouse references. They include SQLAlchemy drivers (3 choices), async clients (also 3), and a Pandas-to-ClickHouse interface among others.

Clickhouse-driver offers a straightforward interface that enables Python clients to connect to ClickHouse, issue SELECT and DDL commands, and process results. It’s a good choice for direct Python connectivity with 16 published releases on pypi.org. The latest version is 0.0.17, published on January 10, 2019. If you want to connect to the data warehouse, issue SQL commands, and fetch back data, clickhouse-driver is a great place to start.

Code and Community

The clickhouse-driver source code is published on Github under an MIT license. The main committer is Konstantin Lebedev (@xzkostyan) though there have been a few contributions from others.

Konstantin is very responsive to questions about the driver, which you can register as issues. Much of my understanding of the wire protocol started from Konstantin’s comprehensive responses to an issue related to CSV loading that I filed early on in my use of the code. He has helped a number of other users as well.

Installation

You can of course install clickhouse-driver straight from Github but since releases are posted on pypi.org it’s far easier to use pip, like the example below. Just a note: examples are based on Python 3.7. This installation command includes lz4 compression, which can reduce data transfer sizes enormously.

For testing purposes it’s a best practice to use a virtual environment, which means the installation usually looks like the following example:

If you use Anaconda there is conveniently a clickhouse package in Anaconda Cloud. You can install it with the following command:

After doing this you can use clickhouse-driver in Jupyter Notebooks served up by Anaconda. We will dig more deeply into Anaconda integration in a future blog article. Meanwhile this should get you started.

Documentation

One of the strengths of clickhouse-driver is excellent documentation. The docs provide a nice introduction to the code as well as detailed descriptions of the API. In fact, it was somewhat challenging to make useful code-level observations for this article because the documentation already covered API behavior so well.

The docs should probably be the first stop for new clickhouse-driver users but are easy to overlook initially since they are referenced at the bottom of the project README.md. I only noticed them after writing a couple of test programs. It would be nice if docs were published in future using Github pages, which puts a prominent link on the top of the Github project. Once you find them though you’ll refer to them regularly.

Basic Operation

Clickhouse-driver is very simple to use. The main interface is the Client class, which most programs import directly.

To set up a connection you instantiate the class with appropriate arguments. Here’s the simplest example for a connection to a localhost server using the default ClickHouse user and unencrypted communications. This is sufficient for trivial tests.

Of course real applications are more demanding. It’s typical to see something akin to the sample code below. It has a non-default user on a secure connection with self-signed certificates. The database is also different from the usual ‘default’. To top it off we are compressing data.

The option flexibility is great. In particular security options are robust and include basic features corporate InfoSec teams expect. With the foregoing options clickhouse-driver auto-negotiates to TLSv1.2 on a properly configured ClickHouse server. That meets current PCI standards among others. I was also very pleased to find easy support for self-signed certificates, which are common in test scenarios.

Creating a client sets up the connection information but does not actually touch the ClickHouse server. The connection is established when you invoke the Client.execute() method. Here’s an example of a simple SELECT, followed by some code to iterate through the query result so we can see how it is put together.

The output is shown below. It’s a list of tuples containing column values.

The result format has a couple of advantages. First, it’s easy to manipulate in Python. For example you can just print any part of the output and it will show values, which is handy for debugging. Second, you can use values immediately rather than having to figure out conversions yourselves. That’s handy because Python does not automatically do even relatively simple coercions like str to int in numerical equations.

Let’s quickly tour operations to create a table, load some data, and fetch it back.

Data definition language (DDL) like CREATE TABLE uses a single string argument. The following example splits the string across lines for readability.

INSERT statements take an extra params argument to hold the values, as shown by the following example.

The format for values is the same as the result format for SELECT statements. Clickhouse-driver uses a similar format in both directions. The INSERT params also support dictionary organization as well as generators, as we’ll see in a later section. See the docs for more insert examples.

We already showed an example of a SELECT statement using functions to generate output. Selecting out of a table looks pretty much the same, as shown by the following example.

Clickhouse-driver has a lot of useful features related to SELECTs. For instance, you can enable progress tracking using the Client.execute_with_progress() method, which is great when pulling down large result sets. Similarly the Client.execute_iter() method allows you to chunk results from large datasets to avoid overflowing memory. There’s even cancellation which covers you when somebody accidentally selects a few billion rows. Again, see the docs for examples.

One place where you need to be a little wary is prevention of SQL injection attacks. The procedure for query parameterization uses Python dictionary substitutions, as in the following example.

You might try to circumvent the substitution scheme by setting ‘species’ to a string like “‘Iris-setosa’ AND evil_function() = 0”. The clickhouse-driver cleverly foils this attack by escaping strings and other common data types before doing substitutions. The query ends up looking like the following, which may break but won’t call evil_function() unexpectedly.

This approach will protect you from run-of-the-mill villany with strings but there are ways around it. For instance, it appears possible to pass in Python object types that will not be escaped properly. (Check the driver code here to see why this might be so.) You should review substitution format strings carefully and also check Python parameter types at runtime to ensure something bad does not weasel through. That’s especially the case for Internet-facing applications.

A Deeper Look at the ClickHouse Wire Protocol

This is a good time to discuss what’s actually happening on the wire when communicating between the Python client and ClickHouse. To set context, ClickHouse has two wire protocols: HTTP protocol which uses simple PUT and POST operations to issue queries, and a native TCP/IP protocol that ships data as typed values. These run on different ports so there’s no confusion.

Clickhouse-driver uses the native TCP/IP protocol. This choice is better for Pythonistas because the native protocol knows about types and avoids loss of precision due to binary-to-string conversions. The implementation is correct, at least for the samples that I tried. That is an impressive accomplishment, because the documentation for the native protocol is the C++ implementation code.

As you go deeper into Python access to ClickHouse it’s helpful to understand what the TCP/IP protocol is actually doing. When you run a query, ClickHouse returns results in a binary block format that contains column results in a typed binary format. Here’s an example:

Unlike many databases ClickHouse results are column-oriented (like the storage). This means that compression works well on query results just as it does on stored values. Compression is invisible to users but can vastly reduce network traffic.

Where ClickHouse is differs from many other DBMS implementations is on upload. Let’s look at the INSERT statement again from the previous section.

This format may be a little confusing if you are used to executing INSERT statements as a single string, which is typical for many DBMS types. What you are seeing is a side-effect of the native TCP/IP wire protocol, which ships typed values in both directions. The data values use a column-oriented format, just like the query output.

The TCP/IP protocol has another curious effect, which is that sending INSERTs as a single string won’t even work in clickhouse-driver. It just hangs and will eventually time out.

What’s going on? The server has the first part of the INSERT and is now waiting for data from the client to complete the INSERT in the native protocol. Meanwhile, the client is waiting for the server to respond. This behavior is clearly documented in the clickhouse-driver documentation so one could argue it’s not a bug: you are doing something the protocol does not expect. I don’t completely agree with that view, mostly because it’s confusing to newcomers. This seems like a nice pull request for somebody to work on in future.

But wait, you might ask. The C++ clickhouse-client binary will process an INSERT like the one shown above. How can that possibly work? Well, the trick is that clickhouse-client runs the same code as the ClickHouse server and can parse the query on the client side. It extracts and sends the INSERT statement up to the VALUES clause, waits for the server to send back data types, then converts and sends the data as column-oriented blocks.

Overall the wire protocol is quite reasonable once you understand what is going on. Problems like hanging INSERTs easy to avoid. If you have further questions I suggest firing up WireShark and watching the packets on an unencrypted, uncompressed connection. It’s relatively easy to figure out what’s happening.

Loading CSV

Armed with a better understanding of what the clickhouse-driver is doing under the covers we can tackle a final topic: how to load CSV.

As we now know you can’t just pipe raw CSV into the the driver the way that the clickhouse-client program does it. Fortunately, there’s an easy solution. You can parse CSV into a list of tuples as shown in the following example.

This code works for the Iris dataset values used in this sample, which are relatively simple and automatically parse into types that load properly. For more diverse tables you may need to add additional logic to coerce types. Here’s another approach that works by assigning values in each line to a dictionary. It’s more complex but ensures types are correctly assigned. You can also rearrange the order of columns in the input and do other manipulations to clean up data.

As files run into the 100s of megabytes or more you may want to consider alternatives to Python to get better throughput. Parsing and converting data in Python is relatively slow compared to the C++ clickhouse-client. I would recommend load testing any Python solution for large scale data ingest to ensure you don’t hit bottlenecks.

Summary and Acknowledgments

The clickhouse-driver is relatively young but it is very capable. I am impressed by the thoughtful design, quality of the implementation, and excellent documentation. It looks like a solid base for future Python work with ClickHouse. We’ll review more Python client solutions in the future but for new users clickhouse-driver is a great place to start.

Thanks to Konstantin Lebedev for reviewing a draft of this article!

Python clickhouse driver

Copy raw contents

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the :ref:`installation` section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Every query should be executed by calling one of the client’s execute methods: execute, execute_with_progress, execute_iter method.

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Percent symbols in inlined constants should be doubled if you mix constants with % symbol and %(myvar)s parameters.

NOTE: formatting queries using Python’s f-strings or concatenation can lead to SQL injections. Use %(myvar)s parameters instead.

Customisation SELECT output with FORMAT clause is not supported.

Selecting data with progress statistics

You can get query progress statistics by using execute_with_progress. It can be useful for cancelling long queries.

When you are dealing with large datasets block by block results streaming may be useful:

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

ClickHouse will execute this query like a usual SELECT query.

Inserting data in different formats with FORMAT clause is not supported.

See :ref:`insert-from-csv-file` if you need to data in custom format.

DDL queries can be executed in the same way SELECT queries are executed:

Async and multithreading

Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.

To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.

The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.

However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

QuickstartВ¶

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Selecting dataВ¶

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Selecting data with progress statisticsВ¶

Streaming resultsВ¶

When you are dealing with large datasets block by block results streaming may be useful:

Inserting dataВ¶

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

Of course for INSERT … SELECT queries data is not needed:

ClickHouse will execute this query like a usual SELECT query.

DDL queries can be executed in the same way SELECT queries are executed:

Asynchronous behaviorВ¶

Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool. To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

This part of the documentation covers basic classes of the driver: Client, Connection and others.

ClientВ¶

Client for communication with the ClickHouse server. Single connection is established per each connected instance of the client.

The following keys when passed in settings are used for configuring the client itself:

Disconnects from the server.

execute ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶

Establishes new connection if it wasn’t established yet. After query execution connection remains intact for next queries. If connection can’t be reused it will be closed and new connection will be created.

execute_iter ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶

New in version 0.0.14.

execute_with_progress ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶

classmethod from_url ( url ) В¶

Return a client configured from the given URL.

Any additional querystring arguments will be passed along to the Connection class’s initializer.

insert_dataframe ( query, dataframe, external_tables=None, query_id=None, settings=None ) В¶

New in version 0.2.0.

Inserts pandas DataFrame with specified query.

number of inserted rows.

query_dataframe ( query, params=None, external_tables=None, query_id=None, settings=None ) В¶

New in version 0.2.0.

Queries DataFrame with specified SELECT query.

ConnectionВ¶

Represents connection between client and ClickHouse server.

Closes connection between server and client. Frees resources: e.g. closes socket.

QueryResultВ¶

Stores query result from multiple blocks.

get_result ( ) В¶

Returns:stored query result.

ProgressQueryResultВ¶

Stores query result and progress information from multiple blocks. Provides iteration over query progress.

get_result ( ) В¶

Returns:stored query result.

IterQueryResultВ¶

Provides iteration over returned data by chunks (streaming by chunks).

Immowelt/PyClickhouse

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Minimalist Clickhouse Python driver with an API roughly resembling Python DB API 2.0 specification.

To develop or run anything in this project, it is recommended to setup a virtual environment using the provided Pipfile:

This will recreate the virtual environment as well, if necessary.

Makefile and running tests

The Makefile target test is provided to run the project’s tests. These require access to a running instance of Clickhouse, which is provided through docker. This assumes that docker is installed and the current user can use it without sudo.

A one-liner to run the tests in the virtual environment would be:

About

Minimalist Clickhouse Python driver with an API roughly resembling Python DB API 2.0 specification.

ClickHouse/clickhouse-connect

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

A suite of Python packages for connecting Python to ClickHouse, initially supporting Apache Superset using a minimal read only SQLAlchemy dialect. Uses the ClickHouse HTTP interface.

ClickHouse Connect requires Python 3.7 or higher. The cython package must be installed prior to installing clickhouse_connect to build and install the optional Cython/C extensions used for improving read and write performance using the ClickHouse Native format. After installing cython if desired, clone this repository and run python setup.py install from the project directory.

Simple ‘command’ that does not return a result set.

Bulk insert of a matrix of rows and columns.

Minimal SQLAlchemy Support

On installation ClickHouse Connect registers the clickhousedb SQLAlchemy Dialect entry point. This dialect supports basic table reflection for table columns and datatypes, and command and query execution using DB API 2.0 cursors. Most ClickHouse datatypes have full query/cursor support.

ClickHouse Connect does not yet implement the full SQLAlchemy API for DDL (Data Definition Language) or ORM (Object Relational Mapping). These features are in development.

On installation ClickHouse Connect registers the clickhousedb Superset Database Engine Spec entry point. Using the clickhousedb SQLAlchemy dialect, the engine spec supports complete data exploration and Superset SQL Lab functionality with all standard ClickHouse data types. See Connecting Superset for complete instructions.

ClickHouse Enum, UUID, and IP Address datatypes are treated as strings. For compatibility with Superset Pandas dataframes, unsigned UInt64 data types are interpreted as signed Int64 values. ClickHouse CSV Upload via SuperSet is not yet implemented.

SQLAlchemy and Superset require the corresponding SQLAlchemy and Apache Superset packages to be included in your Python installation. ClickHouse connect also includes C/Cython extensions for improved performance reading String and FixedString datatypes. These extensions will be installed automatically by setup.py if a C compiler is available.

Query results can be returned as either a numpy array or a pandas DataFrame if the numpy and pandas libraries are available. Use the client methods query_np and query_df respectively.

Main Client Interface

Interaction with the ClickHouse server is done through a clickhouse_connect Client instance. At this point only an HTTP(s) based Client is supported.

HTTP Client constructor/initialization parameters

Create a ClickHouse client using the clickhouse_connect.driver.create_client(. ) function or clickhouse_connect.get_client(. ) wrapper. All parameters are optional:

Any remaining keyword parameters are interpreted as ‘setting’ parameters to send to the ClickHouse server with every query/request

Use the client query method to retrieve a QueryResult from ClickHouse. Parameters:

The query method results a QueryResult object with the following fields:

Numpy and Pandas queries

Datatype options for queries

There are some convenience methods in the clickhouse_connect.driver package that control the format of some ClickHouse datatypes. These are included in part to improve Superset compatibility.

Use the client insert method to insert data into a ClickHouse table. Parameters:

Notes on data inserts

The client insert_df can be used to insert a Pandas DataFrame, assuming the column names in the DataFrame match the ClickHouse table column names. Note that a Numpy array can be passed directly as the data parameter to the primary insert method so there is no separate insert_np method.

For column types that can be different native Python types (for example, UUIDs or IP Addresses), the driver will assume that the data type for the whole column matches the first non «None» value in the column and process insert data accordingly. So if the first data value for insert into a ClickHouse UUID column is a string, the driver will assume all data values in that insert column are strings.

DDL and other «simple» SQL statements

The client command method can be used for ClickHouse commands/queries that return a single result or row of results values. In this case the result is returned as a single row TabSeparated values and are cast to a single string, int, or list of string values. The command method parameters are:

About

Python driver/sqlalchemy/superset connectors

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

FeaturesВ¶

External data for query processingВ¶

You can pass external data alongside with query:

SettingsВ¶

There are a lot of ClickHouse server settings. Settings can be specified during Client initialization:

Each setting can be overridden in an execute statement:

CompressionВ¶

Client with compression support can be constructed as follows:

CityHash algorithm notesВ¶

Unfortunately ClickHouse server comes with built-in old version of CityHash algorithm (1.0.2). That’s why we can’t use original CityHash package. An older version is published separately at PyPI.

Secure connectionВ¶

Specifying query idВ¶

You can manually set query identificator for each query. UUID for example:

You can cancel query with specific id by sending another query with the same query id if option replace_running_query is set to 1.

Query results are fetched by the same instance of Client that emitted query.

Retrieving results in columnar formВ¶

Columnar form sometimes can be more useful.

Data types checking on INSERTВ¶

Data types check is disabled for performance on INSERT queries. You can turn it on by types_check option:

Query execution statisticsВ¶

Client stores statistics about last query execution. It can be obtained by accessing last_query attribute. Statistics is sent from ClickHouse server and calculated on client side. last_query contains info about:

profile: rows before limit

Receiving server logsВ¶

Query logs can be received from server by using send_logs_level setting:

Multiple hostsВ¶

New in version 0.1.3.

This option is good for ClickHouse cluster with multiple replicas.

In example above on every new connection driver will use following sequence of hosts if previous host is unavailable:

All queries within established connection will be sent to the same host.

Python DB API 2.0В¶

New in version 0.1.3.

This driver is also implements DB API 2.0 specification. It can be useful for various integrations.

Threads may share the module and connections.

Connection class is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.

There are some non-standard ClickHouse-related Cursor methods for: external data, settings, etc.

For automatic disposal Connection and Cursor instances can be used as context managers:

Python clickhouse driver

Clickhouse-driver supports Python 3.4 and newer and PyPy.

Starting from version 0.1.0 for building from source gcc, python and linux headers are required.

Example for python:alpine docker image:

By default there are wheels for Linux, Mac OS X and Windows.

Starting from version 0.2.3 there are wheels for musl-based Linux distributions.

These distributions will be installed automatically when installing clickhouse-driver.

These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.

Installation from PyPI

The package can be installed using pip :

You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:

You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:

You can install additional packages (NumPy and Pandas) if you need NumPy support:

NumPy supported versions are limited by numpy package python support.

Installation from github

Development version can be installed directly from github:

long2ice/asynch

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Python clickhouse driver. asynch. Python clickhouse driver фото. Python clickhouse driver-asynch. картинка Python clickhouse driver. картинка asynch. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions

asynch is an asyncio ClickHouse Python Driver with native (TCP) interface support, which reuse most of clickhouse-driver and comply with PEP249.

Connect to ClickHouse

Create table by sql

Use DictCursor to get result with dict

Insert data with dict

Insert data with tuple

Use connection pool

This project is licensed under the Apache-2.0 License.

About

An asyncio ClickHouse Python Driver with native (TCP) interface support.

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 6

Languages

Footer

© 2022 GitHub, Inc.

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

aio-clickhouse 0.0.5

pip install aio-clickhouse Copy PIP instructions

Released: Nov 19, 2021

Library for accessing a ClickHouse database over native interface from the asyncio

Navigation

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags ClickHouse, db, database, cloud, analytics, asyncio

Maintainers

Classifiers

Project description

aioch

aioch is a library for accessing a ClickHouse database over native interface from the asyncio. It wraps features of clickhouse-driver for asynchronous usage.

Installation

The package can be installed using pip :

Usage

For more information see clickhouse-driver usage examples.

Parameters

Other parameters are passing to wrapped clickhouse-driver’s Client.

License

aioch is distributed under the MIT license.

Project details

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags ClickHouse, db, database, cloud, analytics, asyncio

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

pavelmaksimov/clickhousepy

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Python wrapper for database queries Clickhouse

The wrapper is done around clickhouse-driver

Written in python version 3.5

Getting Data from Clickhouse in Pandas Dataframe Format

Brief documentation of some methods

Method of copying data from one table to another with checking the number of rows after copying

A method of copying data from one table to another while removing duplicate rows.

You can contact me at Telegram, Facebook

Удачи тебе, друг! Поставь звездочку 😉

About

Python обертка для запросов в БД Clickhouse

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 2

Languages

Footer

© 2022 GitHub, Inc.

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

maximdanilchenko/aiochclient

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Python clickhouse driver. aiochclient. Python clickhouse driver фото. Python clickhouse driver-aiochclient. картинка Python clickhouse driver. картинка aiochclient. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. aiochclient. Python clickhouse driver фото. Python clickhouse driver-aiochclient. картинка Python clickhouse driver. картинка aiochclient. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. code%20style black 000000. Python clickhouse driver фото. Python clickhouse driver-code%20style black 000000. картинка Python clickhouse driver. картинка code%20style black 000000. pip install clickhouse-connect Copy PIP instructions

An async http(s) ClickHouse client for python 3.6+ supporting type conversion in both directions, streaming, lazy decoding on select queries, and a fully typed interface.

Table of Contents

You can use it with either aiohttp or httpx http connectors.

To use with aiohttp install it with command:

Or aiochclient[aiohttp-speedups] to install with extra speedups.

To use with httpx install it with command:

Or aiochclient[httpx-speedups] to install with extra speedups.

Installing with [*-speedups] adds the following:

Additionally the installation process attempts to use Cython for a speed boost (roughly 30% faster).

Connecting to ClickHouse

aiochclient needs aiohttp.ClientSession or httpx.AsyncClient to connect to ClickHouse:

Querying the database

For fetching all rows at once use the fetch method:

For fetching first row from result use the fetchrow method:

You can also use fetchval method, which returns first value of the first row from query result:

With async iteration on the query results stream you can fetch multiple rows without loading them all into memory at once:

Use fetch / fetchrow / fetchval / iterate for SELECT queries and execute or any of last for INSERT and all another queries.

Working with query results

All fetch queries return rows as lightweight, memory efficient objects. Before v 1.0.0 rows were only returned as tuples. All rows have a full mapping interface, where you can get fields by names or indexes:

To check out the api docs, visit the readthedocs site..

aiochclient automatically converts types from ClickHouse to python types and vice-versa.

Connection Pool Settings

aiochclient uses the aiohttp.TCPConnector to determine pool size. By default, the pool limit is 100 open connections.

It’s highly recommended using uvloop and installing aiochclient with speedups for the sake of speed. Some recent benchmarks on our machines without parallelization:

Note: these benchmarks are system dependent

About

Lightweight async http(s) ClickHouse client for python 3.6+ with types converting

mymarilyn/clickhouse-driver

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.rst

ClickHouse Python Driver

ClickHouse Python Driver with native (TCP) interface support.

Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch

There are two ways to communicate with server:

Pure Client example:

ClickHouse Python Driver is distributed under the MIT license.

About

ClickHouse Python Driver with native interface support

Ускоряем драйвер ClickHouse

ClickHouse – самая быстрая в мире аналитическая СУБД. Для тех, кто с ним ещё не знаком, очень рекомендую попробовать, пересаживаться обратно на MySQL или Postgress потом не захочется.

Обычно данные хранятся в ClickHouse в сыром, неагрегированном виде, и агрегируются на лету при выполнении SQL запросов. Но при решении data science задач часто возникает необходимость выгрузки именно сырых данных, для дальнейшей их обработки в памяти (например, для обучения модели по этим данным). Если выгружать данные в текстовый файл с помощью родного клиента ClickHouse, всё происходит достаточно шустро – “ClickHouse не тормозит”™. Но если пользоваться драйвером для Python, то процесс выгрузки затягивается надолго. Почему?

Но Python представляет все числа, как объекты. Это означает, что драйвер проходит по загруженным данным, преобразует каждое число в объект, и потом уже из этих объектов собирает питоновский массив (состоящий из указателей). Такая операция называется boxing, и при больших объемах данных она отнимает значительное время. Собственно, в ходе загрузки данных через Python-драйвер основное занятие CPU это переупаковка чисел из машинного представления в объекты.

В то же время в data science принято работать c numpy массивами (pandas тоже работает через numpy), которые содержат числа в машинном представлении, как в С. То есть, сначала мы долго упаковывали числа в объекты, а потом, при конвертировании из Python массива в numpy массив будем распаковывать объекты обратно в числа (unboxing). Очевидно, что промежуточное объектное представление здесь только мешает, и если бы драйвер умел выгружать данные сразу в numpy массивы, процесс пошёл бы намного бодрее. Но драйвер этого не умеет, поэтому я его немного доработал, чтобы такая возможность появилась.

Инсталляция

Использование

В data будет содержаться набор колонок. Колонки, представляющие собой числа или timestamp, будут numpy-массивами, остальные колонки (например, строки) будут стандартными Python массивами. В numpy формат конвертируются следующие типы Clickhouse: Int8/16/32/64, UInt8/16/32/64, DateTime.

Полученные данные часто преобразуются в pandas DataFrame с именами колонок, соответствующими именам колонок в БД. Чтобы не делать это каждый раз вручную, в класс Client добавлен метод query_dataframe() :

Результатом будет DataFrame с двумя колонками, a и b.

Benchmarks

Замерялась скорость выполнения запроса SELECT x1,x2. xn FROM table на таблице со 100 млн. записей (реальные данные из Logs API Яндекс.Метрики), engine=MergeTree. Запросы выполнялись на локальном ClickHouse c дефолтными настройками драйвера.

ЗапросВремя, numpyВремя, standardУскорениеMemory, numpyMemory, standard
4 колонки Int80.34 s5.8 s×170.82 Gb3.3 Gb
2 колонки Int641.38 s12 s×8.72.61 Gb9.7 Gb
1 колонка DateTime12.1 s7.1 m×351.16 Gb4.8 Gb

Использование numpy ускоряет чтение на порядок. Особенно заметно ускорение на типе DateTime, потому что работа c временем на уровне Питоновских datetime-объектов происходит очень медленно. Фактически, без использования numpy время выполнения запроса, включающего колонку со временем, выходит за рамки разумного.

В последних двух колонках – объём памяти, занимаемый процессом после выполнения запроса. Видно, что использование numpy не только ускоряет загрузку данных, но и уменьшает объём требуемой памяти примерно в 4 раза.

Ограничения

Ограничения на чтение никак не мешают функционированию драйвера, просто для некоторых типов данных чтение ускоряется, а для некоторых – работает, как обычно.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

This part of the documentation covers basic classes of the driver: Client, Connection and others.

ClientВ¶

Client for communication with the ClickHouse server. Single connection is established per each connected instance of the client.

Parameters:settings – Dictionary of settings that passed to every query. Defaults to None (no additional settings). See all available settings in ClickHouse docs.

Disconnects from the server.

execute ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶

Establishes new connection if it wasn’t established yet. After query execution connection remains intact for next queries. If connection can’t be reused it will be closed and new connection will be created.

execute_iter ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶

New in version 0.0.14.

execute_with_progress ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶

classmethod from_url ( url ) В¶

Return a client configured from the given URL.

Any additional querystring arguments will be passed along to the Connection class’s initializer.

ConnectionВ¶

Represents connection between client and ClickHouse server.

Closes connection between server and client. Frees resources: e.g. closes socket.

QueryResultВ¶

Stores query result from multiple blocks.

get_result ( ) В¶

Returns:stored query result.

ProgressQueryResultВ¶

Stores query result and progress information from multiple blocks. Provides iteration over query progress.

get_result ( ) В¶

Returns:stored query result.

IterQueryResultВ¶

Provides iteration over returned data by chunks (streaming by chunks).

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

FeaturesВ¶

External data for query processingВ¶

You can pass external data alongside with query:

SettingsВ¶

There are a lot of ClickHouse server settings. Settings can be specified during Client initialization:

Each setting can be overridden in an execute statement:

CompressionВ¶

Client with compression support can be constructed as follows:

CityHash algorithm notesВ¶

Unfortunately ClickHouse server comes with built-in old version of CityHash algorithm (1.0.2). That’s why we can’t use original CityHash package. An older version is published separately at PyPI.

Secure connectionВ¶

Specifying query idВ¶

You can manually set query identificator for each query. UUID for example:

You can cancel query with specific id by sending another query with the same query id if option replace_running_query is set to 1.

Query results are fetched by the same instance of Client that emitted query.

Retrieving results in columnar formВ¶

Columnar form sometimes can be more useful.

Data types checking on INSERTВ¶

Data types check is disabled for performance on INSERT queries. You can turn it on by types_check option:

Query execution statisticsВ¶

Client stores statistics about last query execution. It can be obtained by accessing last_query attribute. Statistics is sent from ClickHouse server and calculated on client side. last_query contains info about:

profile: rows before limit

Receiving server logsВ¶

Query logs can be received from server by using send_logs_level setting:

Multiple hostsВ¶

New in version 0.1.3.

This option is good for ClickHouse cluster with multiple replicas.

In example above on every new connection driver will use following sequence of hosts if previous host is unavailable:

All queries within established connection will be sent to the same host.

Python DB API 2.0В¶

New in version 0.1.3.

This driver is also implements DB API 2.0 specification. It can be useful for various integrations.

Threads may share the module and connections.

Connection class is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.

There are some non-standard ClickHouse-related Cursor methods for: external data, settings, etc.

For automatic disposal Connection and Cursor instances can be used as context managers:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

This part of the documentation covers basic classes of the driver: Client, Connection and others.

ClientВ¶

Client for communication with the ClickHouse server. Single connection is established per each connected instance of the client.

The following keys when passed in settings are used for configuring the client itself:

Disconnects from the server.

execute ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶

Establishes new connection if it wasn’t established yet. After query execution connection remains intact for next queries. If connection can’t be reused it will be closed and new connection will be created.

execute_iter ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶

New in version 0.0.14.

execute_with_progress ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶

classmethod from_url ( url ) В¶

Return a client configured from the given URL.

Any additional querystring arguments will be passed along to the Connection class’s initializer.

insert_dataframe ( query, dataframe, transpose=True, external_tables=None, query_id=None, settings=None ) В¶

New in version 0.2.0.

Inserts pandas DataFrame with specified query.

number of inserted rows.

query_dataframe ( query, params=None, external_tables=None, query_id=None, settings=None ) В¶

New in version 0.2.0.

Queries DataFrame with specified SELECT query.

ConnectionВ¶

Represents connection between client and ClickHouse server.

Closes connection between server and client. Frees resources: e.g. closes socket.

QueryResultВ¶

Stores query result from multiple blocks.

get_result ( ) В¶

Returns:stored query result.

ProgressQueryResultВ¶

Stores query result and progress information from multiple blocks. Provides iteration over query progress.

get_result ( ) В¶

Returns:stored query result.

IterQueryResultВ¶

Provides iteration over returned data by chunks (streaming by chunks).

mymarilyn / clickhouse-driver Goto Github PK

ClickHouse Python Driver with native interface support

clickhouse-driver’s Introduction

ClickHouse Python Driver

ClickHouse Python Driver with native (TCP) interface support.

Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch

There are two ways to communicate with server:

Pure Client example:

ClickHouse Python Driver is distributed under the MIT license.

clickhouse-driver’s People

Contributors

Stargazers

Watchers

Forkers

clickhouse-driver’s Issues

Support Interval Types

Recent Clickhouse supports intervals / timedeltas but they have special types.

input_format_skip_unknown_fields setting seems to have no effect

According to the Clickhouse documentation, an exception should be raised if input_format_skip_unknown_fields is set to false

I can reproduce this behavior as expected with clickhouse-client, but clickhouse-driver seems to ignore this setting. In the example below, «z» is not part of the schema.

clickhouse-driver==0.0.10
ClickHouse server version 1.1.54362

output: (no exception rasied in either case)

Conda forge feedstock

I’m working on clickhouse backend for ibis.
Ibis is installable from both pip and conda. I’d like to use clickhouse-driver, but currently clickhouse-driver and clickhouse-cityhash don’t have conda packages.

I’ve already created the recipes, but conda forge packages require maintainers. Would You please create the feedstocks?

DBAPI Support

Thanks for the hard working on this great project.

Does this driver already implement the DBAPI? If not, do we have a plan?

Wrong DateTime insert

After insert datetime.datetime(2018, 1, 19, 10) through this driver I see ‘2018-01-19 13:00:00’ value in table.
Timezone on my computer and clickhouse server is Moskow.

What I must do to see ‘2018-01-19 10:00:00’ after insert?

Pandas interop

@xzkostyan would You like to include pandas support?

Doesn’t work with ipv6-only hosts

SELECT INTO OUTFILE

current version of driver ignore select into outfile?
there are no difference between queries
select * from log LIMIT 10000 INTO OUTFILE ‘/var/tmp/test123.csv’ FORMAT TabSeparated
and
select * from log LIMIT 10000

clickhouse-driver version 0.0.16

Feature ‘format JSON’ needed

Can you provide ‘format JSON’ as the clickhouse-client does?

Extended support of IPv4 and IPv6 column types

I am afraid to come with a form of specific request here. I opened this issue last week on Yandex/clickhouse repository: ClickHouse/ClickHouse#2605. It was about issues with support of specific column and types to store IPv4 and IPv6 data.

I didn’t get any sort of positive answer from them. at least short term.

I was wondering if it could make sense to develop a form of additional types in your code such that column named IPv4_. or IPv6_. benefits from a specific behaviour. the idea would be to apply conversion function in you code and introduce those column as type INET (similar to postGRESQL) : https://www.postgresql.org/docs/9.1/static/datatype-net-types.html.

By having this feature, This type could be handled by upper layers like your SQLAlchemy driver and related UI and Frontend.

Memory Overflow

I’m running Clickhouse on production, using the asyncio wrapper. But sometimes I get a issue when inserting into database.

I don’t know if the issue is the server or the client.

execute wait long time

[[email protected] tmp]# python
Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type «help», «copyright», «credits» or «license» for more information.

from clickhouse_driver import Client
client = Client(‘192.168.133.2′,18123,’client_report’,’admin’,’************’,insert_block_size=1)
client.execute(‘SHOW TABLES’)
^CTraceback (most recent call last):
File «», line 1, in
File «/usr/lib/python2.6/site-packages/clickhouse_driver/client.py», line 159, in execute
self.connection.force_connect()
File «/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py», line 122, in force_connect
self.connect()
File «/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py», line 188, in connect
self.receive_hello()
File «/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py», line 263, in receive_hello
packet_type = read_varint(self.fin)
File «/usr/lib/python2.6/site-packages/clickhouse_driver/reader.py», line 29, in read_varint
i = _read_one(f)
File «/usr/lib/python2.6/site-packages/clickhouse_driver/reader.py», line 14, in _read_one
c = f.read(1)
File «/usr/lib64/python2.6/socket.py», line 383, in read
data = self._sock.recv(left)
KeyboardInterrupt

Extremely slow on large select, http protocol almost 10 times faster

It seems that selecting large datasets using the native client is extremely slow. Here is my benchmark https://gist.github.com/dmitriyshashkin/6a4849bdcf882ba340cdfbc1990da401

Initially, I’ve encountered this behavior on my own dataset, but I was able to reproduce it using the dataset and the structure described here https://clickhouse.yandex/docs/en/getting_started/example_datasets/ontime/

To simplify things a little bit I’ve used the data for just one month: http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_2017_12.zip

As you can see the fastest way to get the data is by using HTTP protocol with requests and pandas. The problem gets worse as the number of the rows grows, on my own dataset with 5M rows I waited for 1 hour before I had to interrupt the process. The bottleneck is not CH itself, the «top» command shows that all the work is done by python with 100% CPU utilization, while CH is almost idle.

Set query settings

If a setting is set to a value different from the default one, then it should be sent to the server.
It could be done in connection.send_query() before write_binary_str(», self.fout) # end of query settings

Better support for writing bytes

insert data into Date column face error

when i try insert into Datetime format column, it’s success.

how to fix this problem? thanks.

Timeout error

I’m getting a timeout error when trying to fetch results from clickhouse using this package on windows.
Any ideas how I can fix it?

Ps.
I checked this connection credentials via jdbc-driver

Error if enum key is blank

How to reproduce

Create table with enum field and insert a few rows

Try to select with python driver:

Links related issues

Do you need to explicitly close the connection using clickhouse_driver?

Set settings.limits

Multithreaded client

I’ve had some issues with using clickhouse_driver in a multithreaded asyncio environment. The connection seems not to be thread safe. I’m not sure that is the best approach to solve the issues I’ve encountered, but here is a pooled connection implementation I am using:

column/string value may be None NoneType’ object has no attribute ‘encode’

column/string value may be None

def try_encode(self, value):
if not isinstance(value, bytes):
return value.encode(‘utf-8’)
return value

Unknown type Tuple(Float64, Float64)

Can’t execute query with column of type Array(Tuple(Float64, Float64)) as result.

/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in read_column(context, column_spec, n_items, buf) 65 def read_column(context, column_spec, n_items, buf): 66 column_options = <'context': context>—> 67 column = get_column_by_spec(column_spec, column_options=column_options) 68 return column.read_data(n_items, buf) 69

Accessing rows_before_limit property through API

The BlockStreamProfileInfo.rows_before_limit property is useful to get the rows count for pagination without running an extra query, but there does not seem to be any way to access it through the API (i.e. the client silently ignores the PROFILE_INFO packet).

As a workaround I hacked together a small change in Client.receive_packet where it saves the last packet.profile_info in the Client instance and we can use it like this:

So it’s not pretty, but it seems to work fine. Anyway, I think it would make sense to have it in the main API without additional hacks, but I’m not sure what’s the best place to put it in. Do you have any thoughts on that? Or perhaps are there any additional caveats that might have caused leaving this out of the API scope?

Error writing string while using supserset and clickhouse driver (text encode not set to UTF-8)

Something new is happening while adding a clickhouse db to supserset at the time superset is trying to get table meta data.

Get progress info

It seems the Progress packets are received and managed but there is no way to get the info from the Client or Connection objects. Here an API proposition with a fetch* method, this is common on a database API.

AttributeError when receiving ServerPacketTypes.PROFILE_INFO

After updating to 0.0.16 the following error appears:

AttributeError: ‘Client’ object has no attribute ‘execute_iter’

looks like there is no execute_iter method in version ‘0.0.10’.

from clickhouse_driver import Client

client = Client(host=’localhost’, port=2102, database=’shard01′)

rows_gen = client.execute_iter(‘select * from query.TCC_S1’, settings=settings)

AttributeError Traceback (most recent call last)
in ()
2 client = Client(host=’localhost’, port=2102, database=’shard01′)
3 settings = <'max_block_size': 100000>
—-> 4 rows_gen = client.execute_iter(‘select * from query.TCC_S1’, settings=settings)

AttributeError: ‘Client’ object has no attribute ‘execute_iter’

SQL injections?

Thanks for this great library. I was thinking whether it has any protection against SQL injection? https://github.com/mymarilyn/clickhouse-driver/blob/master/src/util/escape.py does not seem to have any checks for injected queries.

Iterator support

Is there any way that the return of a SELECT query would be a row iterator instead of loading the whole query result into memory. Thank you!

Insert NULL values for Nullable types

Hi, I have created table with Nullable columns. How to pass NULL values via bulk insert?

It seems there are no analogue value for NULL, None is not working atm.

Actually None is working and it inserts NULL values 🙂

TooLargeStringSize: Code: 131.

I often get the error TooLargeStringSize: Code: 131. on inserting data into a table. How I can prevent it? I already tried to insert really small batches.

Meta/Column names from query

Meta/Column names. how do I get them?

how to get column names with select

Is not working under CentOS 7 after install from pip

Hi,
I’m experiencing some strange issues with module and there are no other issues with any other modules:

Installed from pip without issues:

Example code test.py

When using domain instead of IP address, the DNS resolve don’t change.

I use domain instead of IP address, in order to do load balance and failover.

But when I change the domain mapping, the script still continuing connect to the origin IP address.

For example, I use domain ‘xxx.test.com’ which mapping ip1, and ip2, and start the script, the data will write into ip1 and ip2 round-robin.

After the mapping of domain to ip2 and ip3, that means replace ip1 with ip3, the script still continues to write to ip1.

Big INSERT ends in timeout

I have an issue with a big insert query > 2000 cols.
Finally it runs with native clickhouse client.
After some config changes runs in 0.5 seconds

Now I tried to run the same query with the clickhouse-driver.
It times out.

Before I had the same script working with pymysql.

Any idea how to fix this?

Traceback (most recent call last):
File «Click_v01.py», line 154, in
client.execute(sql)
File «/usr/lib/python3.4/site-packages/clickhouse_driver/client.py», line 73, in execute
query_id=query_id, settings=settings
File «/usr/lib/python3.4/site-packages/clickhouse_driver/client.py», line 88, in process_ordinary_query
return self.receive_result(with_column_types=with_column_types)
File «/usr/lib/python3.4/site-packages/clickhouse_driver/client.py», line 19, in receive_result
block = self.receive_block()
File «/usr/lib/python3.4/site-packages/clickhouse_driver/client.py», line 38, in receive_block
packet = self.connection.receive_packet()
File «/usr/lib/python3.4/site-packages/clickhouse_driver/connection.py», line 234, in receive_packet
packet.type = packet_type = read_varint(self.fin)
File «/usr/lib/python3.4/site-packages/clickhouse_driver/reader.py», line 46, in read_varint
i = _read_one(f)
File «/usr/lib/python3.4/site-packages/clickhouse_driver/reader.py», line 31, in _read_one
c = f.read(1)
File «/usr/lib64/python3.4/socket.py», line 378, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out

Nothing type

Recent clickhouse has a new type Nothing which is currently unhandled by clickhouse-driver.

Selecting Null raises:

Return column type spec

Currently the client returns typenames.

It would be great to get the typespec instead, including nullable flag and/or inner type spec.

clickhouse-driver==0.0.10
ClickHouse server version 1.1.54362

It is also possible to omit values in which case the default value of the column is inserted.

However, clickhouse-driver instead throws an exception:

Working example using clickhouse-client:

I was able to get past the KeyError in the Traceback above by changing

I get to the point of sending the data to the server but get the following error, which I unfortunately don’t have the time to dig into right now to come up with a possible PR/solution:

This would be a great feature so that JSON objects don’t always have to be fully specified in the client code!

Dates are lower by 1

date.fromtimestamp is supposed to take a local timestamp not a UTC one. I am in a timezone that is behind UTC, so all dates are 1 day behind.

To fix this you need to use datetime.datetime.utcfromtimestamp().date() instead.

return datetime.datetime.utcfromtimestamp(value * self.offset).date()

Insert Buffer?

The documentation recommends to insert chunks of at least 1000 items at once. Is there an easy way to add some sort of buffer to gather inserts and thus enhance to performance? Of course, one could use a simple list and append to it/insert if it reaches a specific size, but things get more complicated when someone uses multiprocessing etc.

Asyncio client

First of all, great to have a native driver!

Nested datatype

Tuple support

I can’t return unique combinations of two columns (e.g. src and dst ).
Executing a query with SELECT DISTINCT(field1, field2) causes the following error when returning data from Clickhouse:

Set a query ID

Clickhouse allows to set an ID on queries, you can see this IDs with a:

this is useful when you want to cancel a query from an external process as explain here: https://stackoverflow.com/questions/40546983/how-to-kill-a-process-query-in-clickhouse

I propose that the connection.send_query() method accepts a string argument to name the query. It could be used in client.process_*_query() functions and execute() as an optional argument:

Column names

client.execute(‘SELECT * FROM test3’)

client.execute(‘SELECT a, b, c FROM test3’)

and I will get list of tuples (one tuple for each row), but I don’t have column names.

Is it somehow possible to get all columns names (named tuple, or something) with the data itself, in order to simplify further data manipulation?

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

FeaturesВ¶

External data for query processingВ¶

You can pass external data alongside with query:

SettingsВ¶

There are a lot of ClickHouse server settings. Settings can be specified during Client initialization:

Each setting can be overridden in an execute statement:

CompressionВ¶

Client with compression support can be constructed as follows:

CityHash algorithm notesВ¶

Unfortunately ClickHouse server comes with built-in old version of CityHash algorithm (1.0.2). That’s why we can’t use original CityHash package. An older version is published separately at PyPI.

Secure connectionВ¶

Specifying query idВ¶

You can manually set query identificator for each query. UUID for example:

You can cancel query with specific id by sending another query with the same query id if option replace_running_query is set to 1.

Query results are fetched by the same instance of Client that emitted query.

Retrieving results in columnar formВ¶

Columnar form sometimes can be more useful.

Data types checking on INSERTВ¶

Data types check is disabled for performance on INSERT queries. You can turn it on by types_check option:

Query execution statisticsВ¶

Client stores statistics about last query execution. It can be obtained by accessing last_query attribute. Statistics is sent from ClickHouse server and calculated on client side. last_query contains info about:

profile: rows before limit

Receiving server logsВ¶

Query logs can be received from server by using send_logs_level setting:

Multiple hostsВ¶

New in version 0.1.3.

This option is good for ClickHouse cluster with multiple replicas.

In example above on every new connection driver will use following sequence of hosts if previous host is unavailable:

All queries within established connection will be sent to the same host.

Python DB API 2.0В¶

New in version 0.1.3.

This driver is also implements DB API 2.0 specification. It can be useful for various integrations.

Threads may share the module and connections.

Connection class is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.

There are some non-standard ClickHouse-related Cursor methods for: external data, settings, etc.

For automatic disposal Connection and Cursor instances can be used as context managers:

NumPy arrays supportВ¶

New in version 0.1.6.

NumPy arrays are not used when reading nullable columns and columns of unsupported types.

Direct loading into NumPy arrays increases performance and lowers memory requirements on large amounts of rows.

Direct loading into pandas DataFrame is also supported by using query_dataframe :

Writing pandas DataFrame is also supported with insert_dataframe :

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

QuickstartВ¶

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Selecting dataВ¶

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Selecting data with progress statisticsВ¶

Streaming resultsВ¶

When you are dealing with large datasets block by block results streaming may be useful:

Inserting dataВ¶

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

Of course for INSERT … SELECT queries data is not needed:

ClickHouse will execute this query like a usual SELECT query.

DDL queries can be executed in the same way SELECT queries are executed:

Asynchronous behaviorВ¶

Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool. To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

QuickstartВ¶

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Selecting dataВ¶

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Customisation SELECT output with FORMAT clause is not supported.

Selecting data with progress statisticsВ¶

Streaming resultsВ¶

When you are dealing with large datasets block by block results streaming may be useful:

Inserting dataВ¶

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

ClickHouse will execute this query like a usual SELECT query.

Inserting data in different formats with FORMAT clause is not supported.

See Inserting data from CSV file if you need to data in custom format.

DDL queries can be executed in the same way SELECT queries are executed:

Async and multithreadingВ¶

Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.

To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.

The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.

However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.

aiochclient 2.2.0

pip install aiochclient Copy PIP instructions

Released: Aug 18, 2022

Async http clickhouse client for python 3.6+

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags clickhouse, async, python, aiohttp

Maintainers

Classifiers

Project description

aiochclient

An async http(s) ClickHouse client for python 3.6+ supporting type conversion in both directions, streaming, lazy decoding on select queries, and a fully typed interface.

Table of Contents

Installation

You can use it with either aiohttp or httpx http connectors.

To use with aiohttp install it with command:

Or aiochclient[aiohttp-speedups] to install with extra speedups.

To use with httpx install it with command:

Or aiochclient[httpx-speedups] to install with extra speedups.

Installing with [*-speedups] adds the following:

Additionally the installation process attempts to use Cython for a speed boost (roughly 30% faster).

Quick Start

Connecting to ClickHouse

aiochclient needs aiohttp.ClientSession or httpx.AsyncClient to connect to ClickHouse:

Querying the database

For fetching all rows at once use the fetch method:

For fetching first row from result use the fetchrow method:

You can also use fetchval method, which returns first value of the first row from query result:

With async iteration on the query results stream you can fetch multiple rows without loading them all into memory at once:

Use fetch / fetchrow / fetchval / iterate for SELECT queries and execute or any of last for INSERT and all another queries.

Working with query results

All fetch queries return rows as lightweight, memory efficient objects. Before v 1.0.0 rows were only returned as tuples. All rows have a full mapping interface, where you can get fields by names or indexes:

Documentation

To check out the api docs, visit the readthedocs site..

Type Conversion

aiochclient automatically converts types from ClickHouse to python types and vice-versa.

Connection Pool Settings

aiochclient uses the aiohttp.TCPConnector to determine pool size. By default, the pool limit is 100 open connections.

Notes on Speed

It’s highly recommended using uvloop and installing aiochclient with speedups for the sake of speed. Some recent benchmarks on our machines without parallelization:

Note: these benchmarks are system dependent

airflow-clickhouse-plugin 0.8.2

pip install airflow-clickhouse-plugin Copy PIP instructions

Released: Jun 11, 2022

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License

Tags clickhouse, airflow

Requires: Python >=3.6.*

Maintainers

Classifiers

Project description

Airflow ClickHouse Plugin

Features

Installation and dependencies

Requires apache-airflow and clickhouse-driver (installed automatically by pip ). Primarily supports Airflow 2.0–2.3. Later versions are expected to work properly but may be not fully tested. Use plugin versions below 0.6.0 (e.g. 0.5.7.post1) to preserve compatibility with Airflow 1.10.6 (this version has long-term support on Google Cloud Composer).

Note on pandas dependency

Usage

ClickHouseOperator Reference

To import ClickHouseOperator use: from airflow_clickhouse_plugin.operators.clickhouse_operator import ClickHouseOperator

The result of the last query is pushed to XCom.

ClickHouseHook Reference

To import ClickHouseHook use: from airflow_clickhouse_plugin.hooks.clickhouse_hook import ClickHouseHook

Supported kwargs of constructor ( __init__ method):

Supports all the methods of the Airflow BaseHook including:

ClickHouseSqlSensor Reference

Sensor fully inherits from Airflow SQLSensor and therefore fully implements its interface using ClickHouseHook to fetch the SQL execution result and supports templating of sql argument.

ClickHouse Connection schema

clickhouse_driver.Client is initiated with attributes stored in Airflow Connection attributes. The mapping of the attributes is listed below:

Airflow Connection attributeClient.__init__ argument
hosthost
portport
schemadatabase
loginuser
passwordpassword

If you pass database argument to ClickHouseOperator or ClickHouseHook explicitly then it is passed to the Client instead of the schema attribute of the Airflow connection.

Extra arguments

For example, if Airflow connection contains extra= <"secure":true>then the Client.__init__ will receive secure=True keyword argument in addition to other non-empty connection attributes.

Default values

If the Airflow connection attribute is not set then it is not passed to the Client at all. In that case the default value of the corresponding clickhouse_driver.Connection argument is used (e.g. user defaults to ‘default’ ).

This means that Airflow ClickHouse Plugin does not itself define any default values for the ClickHouse connection. You may fully rely on default values of the clickhouse-driver version you use. The only exception is host : if the attribute of Airflow connection is not set then ‘localhost’ is used.

Default connection

Examples

ClickHouseOperator Example

ClickHouseHook Example

Important note: don’t try to insert values using ch_hook.run(‘INSERT INTO some_ch_table VALUES (1)’) literal form. clickhouse-driver requires values for INSERT query to be provided via parameters due to specifics of the native ClickHouse protocol.

ClickHouseSqlSensor Example

How to run tests

Unit tests

Integration tests

Integration tests require access to ClickHouse server. Tests use connection URI defined via environment variable AIRFLOW_CONN_CLICKHOUSE_DEFAULT with clickhouse://localhost as default.

All tests

Github Actions

Github Action is set up for this project.

Run tests using Docker

Run ClickHouse server inside Docker:

The above command will open bash inside the container.

Install dependencies into container and run tests (execute inside container):

How to upload to PyPI

Run tests for test PyPI version:

Pandas test may fail.

Test public PyPI (run clickhouse container), with pandas:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

QuickstartВ¶

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Selecting dataВ¶

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Selecting data with progress statisticsВ¶

Streaming resultsВ¶

When you are dealing with large datasets block by block results streaming may be useful:

Inserting dataВ¶

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

Of course for INSERT … SELECT queries data is not needed:

ClickHouse will execute this query like a usual SELECT query.

DDL queries can be executed in the same way SELECT queries are executed:

Async and multithreadingВ¶

Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.

To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.

The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.

However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.

madiedinro/simple-clickhouse

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Simple ClickHouse lib

Install using pip from pypi repository

Or latest version from git

При использовании в Rockstat, параметры указывать не требуется. Они подставляются автоматически из переменных окружения.

Selecting without decoding

Selecting as dict’s steam

Disabling decoding for streaming data

Чтобы получить результат в виде строки воспользуйтесь bytes_decoder

Executing sql statements

Для для записи данных, управления БД и других операция (не select) слудует использовать метод run

Можно использовать для «ручной» записи данных

Microbatch writing using context manager

new

On exit context all data will be flushed.

Old manual conrolled mechanic.

Some Simpe Magick

To create instance of TableDiscovery call

One of records or columns should be filled.

Detect using present data

Next times after use table auto discovery you shoud use fixed layout. To to this easy try TableDiscovery.pycode()

will be returned

Correct detected / implicit set data-types

TableDiscovery.int(*args) set columnts to int

Set date columns

Set date column

Set str columns

Set strinmg column

Set primary key columns

Set metrics

other marked as dimensions

Set dimensions

other marked as metrics

Print table create statement / execute query

Difference handling. Be careful currently it Proof of concept

All records will be flushed to DB on context exit

Выполнение запроса и чтение всего результата сразу

Получение записей потоком

Выполнение SQL операций

all data will be flushed on exit context

The MIT License (MIT)

Copyright (c) 2018-2019 Dmitry Rodin

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the «Software»), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED «AS IS», WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

asynch 0.1.9

pip install asynch Copy PIP instructions

Released: Jun 2, 2021

A asyncio driver for ClickHouse with native tcp protocol

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: Apache Software License (Apache-2.0)

Author: long2ice

Tags ClickHouse, asyncio, driver

Requires: Python >=3.7, long2ice

Classifiers

Project description

asynch

Introduction

asynch is an asyncio ClickHouse Python Driver with native (TCP) interface support, which reuse most of clickhouse-driver and comply with PEP249.

Install

Usage

Connect to ClickHouse

Create table by sql

Use DictCursor to get result with dict

Insert data with dict

Insert data with tuple

Use connection pool

ThanksTo

License

This project is licensed under the Apache-2.0 License.

Project details

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: Apache Software License (Apache-2.0)

Author: long2ice

Tags ClickHouse, asyncio, driver

Requires: Python >=3.7, long2ice

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

QuickstartВ¶

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Selecting dataВ¶

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Selecting data with progress statisticsВ¶

Streaming resultsВ¶

When you are dealing with large datasets block by block results streaming may be useful:

Inserting dataВ¶

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

Of course for INSERT … SELECT queries data is not needed:

ClickHouse will execute this query like a usual SELECT query.

DDL queries can be executed in the same way SELECT queries are executed:

Asynchronous behaviorВ¶

Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool. To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.

clickhouse-sqlalchemy 0.2.2

pip install clickhouse-sqlalchemy Copy PIP instructions

Released: Aug 24, 2022

Simple ClickHouse SQLAlchemy Dialect

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags ClickHouse, db, database, cloud, analytics

Requires: Python >=3.6, xzkostyan

Classifiers

Project description

ClickHouse SQLAlchemy

ClickHouse dialect for SQLAlchemy to ClickHouse database.

Documentation

Usage

native [recommended] (TCP) via clickhouse-driver

http via requests

Insert some data

And query inserted data

License

ClickHouse SQLAlchemy is distributed under the MIT license.

Project details

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags ClickHouse, db, database, cloud, analytics

Requires: Python >=3.6, xzkostyan

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

PerformanceВ¶

This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.

clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.

When you read data over HTTP you may need to cast strings into Python types.

Test dataВ¶

Sample data for testing is taken from ClickHouse docs.

Create database and table:

Download some data for 2017 year:

Insert data into ClickHouse:

Required packagesВ¶

For fast json parsing we’ll use ujson package:

VersionsВ¶

Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]

BenchmarkingВ¶

Scripts below can be benchmarked with following one-liner:

Time will measure:

Plain text without parsingВ¶

Let’s take get plain text response from ClickHouse server as baseline.

Fetching not parsed data with pure requests (1)

Parsed rowsВ¶

Line split into elements will be consider as “parsed” for TSV format (2)

Now we cast each element to it’s data type (2.5)

JSONEachRow format can be loaded with json loads (3)

Get fully parsed rows with clickhouse-driver in Native format (4)

Iteration over rowsВ¶

Iteration over TSV (5)

Now we cast each element to it’s data type (5.5)

Iteration over JSONEachRow (6)

Iteration over rows with clickhouse-driver in Native format (7)

Iteration over string rowsВ¶

OK, but what if we need only string columns?

Iteration over TSV (8)

Iteration over JSONEachRow (9)

Iteration over string rows with clickhouse-driver in Native format (10)

Iteration over int rowsВ¶

Iteration over TSV (11)

Iteration over JSONEachRow (12)

Iteration over int rows with clickhouse-driver in Native format (13)

ResultsВ¶

This table contains memory and timing benchmark results of snippets above.

JSON in table is shorthand for JSONEachRow.

Rows
50k131k217k450k697k
Plain text without parsing: timing
Naive requests.get TSV (1)0.40 s0.67 s0.95 s1.67 s2.52 s
Naive requests.get JSON (1)0.61 s1.23 s2.09 s3.52 s5.20 s
Plain text without parsing: memory
Naive requests.get TSV (1)49 MB107 MB165 MB322 MB488 MB
Naive requests.get JSON (1)206 MB564 MB916 MB1.83 GB2.83 GB
Parsed rows: timing
requests.get TSV (2)0.81 s1.81 s3.09 s7.22 s11.87 s
requests.get TSV with cast (2.5)1.78 s4.58 s7.42 s16.12 s25.52 s
requests.get JSON (3)2.14 s5.65 s9.20 s20.43 s31.72 s
clickhouse-driver Native (4)0.73 s1.40 s2.08 s4.03 s6.20 s
Parsed rows: memory
requests.get TSV (2)171 MB462 MB753 MB1.51 GB2.33 GB
requests.get TSV with cast (2.5)135 MB356 MB576 MB1.15 GB1.78 GB
requests.get JSON (3)139 MB366 MB591 MB1.18 GB1.82 GB
clickhouse-driver Native (4)135 MB337 MB535 MB1.05 GB1.62 GB
Iteration over rows: timing
requests.get TSV (5)0.49 s0.99 s1.34 s2.58 s4.00 s
requests.get TSV with cast (5.5)1.38 s3.38 s5.40 s10.89 s16.59 s
requests.get JSON (6)1.89 s4.73 s7.63 s15.63 s24.60 s
clickhouse-driver Native (7)0.62 s1.28 s1.93 s3.68 s5.54 s
Iteration over rows: memory
requests.get TSV (5)19 MB19 MB19 MB19 MB19 MB
requests.get TSV with cast (5.5)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (6)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (7)56 MB70 MB71 MB71 MB71 MB
Iteration over string rows: timing
requests.get TSV (8)0.40 s0.67 s0.80 s1.55 s2.18 s
requests.get JSON (9)1.14 s2.64 s4.22 s8.48 s12.96 s
clickhouse-driver Native (10)0.46 s0.91 s1.35 s2.49 s3.67 s
Iteration over string rows: memory
requests.get TSV (8)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (9)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (10)46 MB56 MB57 MB57 MB57 MB
Iteration over int rows: timing
requests.get TSV (11)0.84 s2.06 s3.22 s6.27 s10.06 s
requests.get JSON (12)0.95 s2.15 s3.55 s6.93 s10.82 s
clickhouse-driver Native (13)0.43 s0.61 s0.86 s1.53 s2.27 s
Iteration over int rows: memory
requests.get TSV (11)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (12)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (13)41 MB48 MB48 MB48 MB49 MB

ConclusionВ¶

If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.

It doesn’t matter which interface to use if you manipulate small amount of rows.

ClickSQL 0.1.9.4

pip install ClickSQL Copy PIP instructions

Released: Nov 24, 2021

A python client for Clickhouse

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT Licence

Author: sn0wfree

Tags ClickHouse, Databases, SQL, Python, Client

Maintainers

Project description

ClickSQL: ClickHouse client for Humans

ClickSQL is a python client for ClickHouse database, which may help users to use ClickHouse more easier and pythonic. More information for ClickHouse can be found at here

Installation

pip install ClickSQL

Usage

Initial connection

to setup a database connection and send a heartbeat-check signal

Query

execute a SQL Query

execute a Query without SQL

Insert data

insert data into database by various ways

Insert data via DataFrame

Insert data via SQL(Inner)

Create table

Create table by SQL

Create table by DataFrame

Contribution

Welcome to improve this package or submit an issue or any others

Author

Available functions or properties

In Process

schedule

Project details

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT Licence

Author: sn0wfree

Tags ClickHouse, Databases, SQL, Python, Client

Python clickhouse driver

Each ClickHouse type is deserialized to a corresponding Python type when SELECT queries are prepared. When serializing INSERT queries, clickhouse-driver accepts a broader range of Python types. The following ClickHouse types are supported by clickhouse-driver:

Date32 support is new in version 0.2.2.

Timezone support is new in version 0.0.11. DateTime64 support is new in version 0.1.3.

Integers are interpreted as seconds without timezone (UNIX timestamps). Integers can be used when insertion of datetime column is a bottleneck.

Setting use_client_time_zone is taken into consideration.

You can cast DateTime column to integers if you are facing performance issues when selecting large amount of rows.

Due to Python’s current limitations minimal DateTime64 resolution is one microsecond.

String column is encoded/decoded with encoding specified by strings_encoding setting. Default encoding is UTF-8.

You can specify custom encoding:

Encoding is applied to all string fields in query.

String columns can be returned without any decoding. In this case return values are bytes:

If a column has FixedString type, upon returning from SELECT it may contain trailing zeroes in accordance with ClickHouse’s storage format. Trailing zeroes are stripped by driver for convenience.

During INSERT, if strings_as_bytes setting is not specified and string cannot be encoded with encoding, a UnicodeEncodeError will be raised.

Currently clickhouse-driver can’t handle empty enum value due to Python’s Enum mechanics. Enum member name must be not empty. See issue and workaround.

ClickHouse/clickhouse-odbc

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

ODBC Driver for ClickHouse

Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions

This is the official ODBC driver implementation for accessing ClickHouse as a data source.

For more information on ClickHouse go to ClickHouse home page.

For more information on what ODBC is go to ODBC Overview.

The canonical repo for this driver is located at https://github.com/ClickHouse/clickhouse-odbc.

See LICENSE file for licensing information.

Table of contents

Pre-built binary packages of the release versions of the driver available for the most common platforms at:

Note, that since ODBC drivers are not used directly by a user, but rather accessed through applications, which in their turn access the driver through ODBC driver manager, user have to install the driver for the same architecture (32- or 64-bit) as the application that is going to access the driver. Moreover, both the driver and the application must be compiled for (and actually use during run-time) the same ODBC driver manager implementation (we call them «ODBC providers» here). There are three supported ODBC providers:

If you have Homebrew installed (usually applicable to macOS only, but can also be available in Linux), just execute:

If you don’t see a package that matches your platforms under Releases, or the version of your system is significantly different than those of the available packages, or maybe you want to try a bleeding edge version of the code that hasn’t been released yet, you can always build the driver manually from sources:

Native packages will have all the dependency information so when you install the driver using a native package, all required run-time packages will be installed automatically. If you use manual packaging, i.e., just extract driver binaries to some folder, you also have to make sure that all the run-time dependencies are satisfied in your system manually:

The first step usually consists of registering the driver so that the corresponding ODBC provider is able to locate it.

The next step is defining one or more DSNs, associated with the newly registered driver, and setting driver-specific parameters in the body of those DSN definitions.

All this involves modifying a dedicated registry keys in case of MDAC, or editing odbcinst.ini (for driver registration) and odbc.ini (for DSN definition) files for UnixODBC or iODBC, directly or indirectly.

This will be performed automatically using some default values if you are installing the driver using native installers.

Otherwise, if you are configuring manually, or need to modify the default configuration created by the installer, please see the exact locations of files (or registry keys) that need to be modified in the corresponding section below:

The list of DSN parameters recognized by the driver is as follows:

URL query string

Some of configuration parameters can be passed to the server as a part of the query string of the URL.

The list of parameters in the query string of the URL that are also recognized by the driver is as follows:

ParameterDefault valueDescription
databasedefaultDatabase name to connect to
default_formatODBCDriver2Default wire format of the resulting data that the server will send to the driver. Formats supported by the driver are: ODBCDriver2 and RowBinaryWithNamesAndTypes

Note, that currently there is a difference in timezone handling between ODBCDriver2 and RowBinaryWithNamesAndTypes formats: in ODBCDriver2 date and time values are presented to the ODBC application in server’s timezone, wherease in RowBinaryWithNamesAndTypes they are converted to local timezone. This behavior will be changed/parametrized in future. If server and ODBC application timezones are the same, date and time values handling will effectively be identical between these two formats.

Troubleshooting: driver manager tracing and driver logging

To debug issues with the driver, first things that need to be done are:

Building from sources

The general requirements for building the driver from sources are as follows:

Additional requirements exist for each platform, which also depend on whether packaging and/or testing is performed.

See the exact steps for each platform in the corresponding section below:

The list of configuration options recognized during the CMake generation step is as follows:

Run-time dependencies: Windows

All modern Windows systems come with preinstalled MDAC driver manager.

Run-time dependencies: macOS

Execute the following in the terminal (assuming you have Homebrew installed):

Execute the following in the terminal (assuming you have Homebrew installed):

Run-time dependencies: Red Hat/CentOS

Execute the following in the terminal:

Execute the following in the terminal:

Run-time dependencies: Debian/Ubuntu

Execute the following in the terminal:

Execute the following in the terminal:

Configuration: MDAC/WDAC (Microsoft/Windows Data Access Components)

To configure already installed drivers and DSNs, or create new DSNs, use Microsoft ODBC Data Source Administrator tool:

For full description of ODBC configuration mechanism in Windows, as well as for the case when you want to learn how to manually register a driver and have a full control on configuration in general, see:

Note, that the keys are subject to «Registry Redirection» mechanism, with caveats.

You can find sample configuration for this driver here (just map the keys to corresponding sections in registry):

In short, usually you will end up editing /etc/odbcinst.ini and /etc/odbc.ini for system-wide driver and DSN entries, and

/.odbc.ini for user-wide driver and DSN entries.

For more info, see:

You can find sample configuration for this driver here:

These samples can be added to the corresponding configuration files using the odbcinst tool (assuming the package is installed under /usr/local ):

In short, usually you will end up editing /etc/odbcinst.ini and /etc/odbc.ini for system-wide driver and DSN entries, and

/.odbc.ini for user-wide driver and DSN entries.

In macOS, if those INI files exist, they usually are symbolic or hard links to /Library/ODBC/odbcinst.ini and /Library/ODBC/odbc.ini for system-wide, and

/Library/ODBC/odbc.ini for user-wide configs respectively.

For more info, see:

You can find sample configuration for this driver here:

Enabling driver manager tracing: MDAC/WDAC (Microsoft/Windows Data Access Components)

Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:

Enabling driver manager tracing: UnixODBC

Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:

Enabling driver manager tracing: iODBC

Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:

Building from sources: Windows

CMake bundled with the recent versions of Visual Studio can be used.

An SDK required for building the ODBC driver is included in Windows SDK, which in its turn is also bundled with Visual Studio.

All of the following commands have to be issued in Visual Studio Command Prompt:

Clone the repo with submodules:

Enter the cloned source tree, create a temporary build folder, and generate the solution and project files in it:

Build the generated solution in-place:

. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):

Building from sources: macOS

You will need macOS 10.14 or later, Xcode 10 or later with Command Line Tools installed, as well as up-to-date Homebrew available in the system.

Install Homebrew using the following command, and follow the printed instructions on any additional steps required to complete the installation:

Then, install the latest Xcode from App Store. Open it at least once to accept the end-user license agreement and automatically install the required components.

Then, make sure that the latest Command Line Tools are installed and selected in the system:

Build-time dependencies: iODBC

Execute the following in the terminal:

Build-time dependencies: UnixODBC

Execute the following in the terminal:

Clone the repo recursively with submodules:

Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:

Build the generated solution in-place:

. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):

Building from sources: Red Hat/CentOS

Build-time dependencies: UnixODBC

Execute the following in the terminal:

Build-time dependencies: iODBC

Execute the following in the terminal:

All of the following commands have to be issued right after this one command issued in the same terminal session:

Clone the repo with submodules:

Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:

Build the generated solution in-place:

. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):

Building from sources: Debian/Ubuntu

Build-time dependencies: UnixODBC

Execute the following in the terminal:

Build-time dependencies: iODBC

Execute the following in the terminal:

Assuming, that the system cc and c++ are pointing to the compilers that satisfy the minimum requirements from Building from sources.

If the version of cmake is not recent enough, you can install a newer version by folowing instructions from one of these pages:

Clone the repo with submodules:

Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:

Build the generated solution in-place:

. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

QuickstartВ¶

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Selecting dataВ¶

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Selecting data with progress statisticsВ¶

Streaming resultsВ¶

When you are dealing with large datasets block by block results streaming may be useful:

Inserting dataВ¶

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

Of course for INSERT … SELECT queries data is not needed:

ClickHouse will execute this query like a usual SELECT query.

DDL queries can be executed in the same way SELECT queries are executed:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

This part of the documentation covers basic classes of the driver: Client, Connection and others.

ClientВ¶

Client for communication with the ClickHouse server. Single connection is established per each connected instance of the client.

Parameters:settings – Dictionary of settings that passed to every query. Defaults to None (no additional settings). See all available settings in ClickHouse docs.

Disconnects from the server.

execute ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶

Establishes new connection if it wasn’t established yet. After query execution connection remains intact for next queries. If connection can’t be reused it will be closed and new connection will be created.

execute_iter ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶

New in version 0.0.14.

execute_with_progress ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶

ConnectionВ¶

Represents connection between client and ClickHouse server.

Closes connection between server and client. Frees resources: e.g. closes socket.

QueryResultВ¶

Stores query result from multiple blocks.

get_result ( ) В¶

Returns:Stored query result.

ProgressQueryResultВ¶

Stores query result and progress information from multiple blocks. Provides iteration over query progress.

get_result ( ) В¶

Returns:Stored query result.

IterQueryResultВ¶

Provides iteration over returned data by chunks (streaming by chunks).

clickhouse-driver 0.2.4

pip install clickhouse-driver==0.2.4 Copy PIP instructions

Released: Jun 13, 2022

Python driver with native interface for ClickHouse

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags ClickHouse, db, database, cloud, analytics

Requires: Python >=3.4, xzkostyan

Classifiers

Project description

ClickHouse Python Driver

ClickHouse Python Driver with native (TCP) interface support.

Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch

Features

Documentation

Usage

There are two ways to communicate with server:

Pure Client example:

License

ClickHouse Python Driver is distributed under the MIT license.

Project details

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Tags ClickHouse, db, database, cloud, analytics

Requires: Python >=3.4, xzkostyan

Infinidat/infi.clickhouse_orm

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

This project is simple ORM for working with the ClickHouse database. It allows you to define model classes whose instances can be written to the database and read from it.

Let’s jump right in with a simple example of monitoring CPU usage. First we need to define the model class, connect to the database and create a table for the model:

Now we can collect usage statistics per CPU, and write them to the database:

Querying the table is easy, using either the query builder or raw SQL:

This and other examples can be found in the examples folder.

To learn more please visit the documentation.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

QuickstartВ¶

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Selecting dataВ¶

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Selecting data with progress statisticsВ¶

Streaming resultsВ¶

When you are dealing with large datasets block by block results streaming may be useful:

Inserting dataВ¶

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

Of course for INSERT … SELECT queries data is not needed:

ClickHouse will execute this query like a usual SELECT query.

DDL queries can be executed in the same way SELECT queries are executed:

Async and multithreadingВ¶

Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.

To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.

The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.

However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

DB API 2.0В¶

This part of the documentation covers driver DB API.

clickhouse_driver.dbapi. connect ( dsn=None, host=None, user=’default’, password=», port=9000, database=’default’, **kwargs ) В¶

Create a new database connection.

The connection can be specified via DSN:

or using database and credentials arguments:

The basic connection parameters are:

See defaults in Connection constructor.

DSN or host is required.

Any other keyword parameter will be passed to the underlying Connection class.

Returns:a new connection.

exception clickhouse_driver.dbapi. Warning В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. Error В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. DataError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. DatabaseError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. ProgrammingError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. IntegrityError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. InterfaceError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. InternalError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. NotSupportedError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. OperationalError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

ConnectionВ¶

Creates new Connection for accessing ClickHouse database.

Connection is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.

Close the connection now. The connection will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the connection. The same applies to all cursor objects trying to use the connection.

Do nothing since ClickHouse has no transactions.

cursor ( ) В¶

Returns:a new Cursor Object using the connection.

rollback ( ) В¶

Do nothing since ClickHouse has no transactions.

CursorВ¶

Close the cursor now. The cursor will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the cursor.

Prepare and execute a database operation (query or command).

executemany ( operation, seq_of_parameters ) В¶

Fetch all (remaining) rows of a query result, returning them as a sequence of sequences (e.g. a list of tuples).

Returns:list of fetched rows.

fetchmany ( size=None ) В¶

Fetch the next set of rows of a query result, returning a sequence of sequences (e.g. a list of tuples). An empty sequence is returned when no more rows are available.

Parameters:size – amount of rows to return.
Returns:list of fetched rows or empty list.

fetchone ( ) В¶

Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.

Adds external table to cursor context.

If the same table is specified more than once the last one is used.

set_query_id ( query_id ) В¶

Specifies the query identifier for cursor.

Parameters:query_id – the query identifier.
Returns:None

set_settings ( settings ) В¶

Specifies settings for cursor.

Parameters:settings – dictionary of query settings
Returns:None

set_stream_results ( stream_results, max_row_buffer ) В¶

Toggles results streaming from server. Driver will consume block-by-block of max_row_buffer size and yield row-by-row from each block.

set_types_check ( types_check ) В¶

Toggles type checking for sequence of INSERT parameters. Disabled by default.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

DB API 2.0В¶

This part of the documentation covers driver DB API.

clickhouse_driver.dbapi. connect ( dsn=None, host=None, user=’default’, password=», port=9000, database=’default’, **kwargs ) В¶

Create a new database connection.

The connection can be specified via DSN:

or using database and credentials arguments:

The basic connection parameters are:

See defaults in Connection constructor.

DSN or host is required.

Any other keyword parameter will be passed to the underlying Connection class.

Returns:a new connection.

exception clickhouse_driver.dbapi. Warning В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. Error В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. DataError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. DatabaseError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. ProgrammingError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. IntegrityError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. InterfaceError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. InternalError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. NotSupportedError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception clickhouse_driver.dbapi. OperationalError В¶ with_traceback ( ) В¶

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

ConnectionВ¶

Creates new Connection for accessing ClickHouse database.

Connection is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.

Close the connection now. The connection will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the connection. The same applies to all cursor objects trying to use the connection.

Do nothing since ClickHouse has no transactions.

cursor ( ) В¶

Returns:a new Cursor Object using the connection.

rollback ( ) В¶

Do nothing since ClickHouse has no transactions.

CursorВ¶

Close the cursor now. The cursor will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the cursor.

Prepare and execute a database operation (query or command).

executemany ( operation, seq_of_parameters ) В¶

Fetch all (remaining) rows of a query result, returning them as a sequence of sequences (e.g. a list of tuples).

Returns:list of fetched rows.

fetchmany ( size=None ) В¶

Fetch the next set of rows of a query result, returning a sequence of sequences (e.g. a list of tuples). An empty sequence is returned when no more rows are available.

Parameters:size – amount of rows to return.
Returns:list of fetched rows or empty list.

fetchone ( ) В¶

Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.

Adds external table to cursor context.

If the same table is specified more than once the last one is used.

set_query_id ( query_id ) В¶

Specifies the query identifier for cursor.

Parameters:query_id – the query identifier.
Returns:None

set_settings ( settings ) В¶

Specifies settings for cursor.

Parameters:settings – dictionary of query settings
Returns:None

set_stream_results ( stream_results, max_row_buffer ) В¶

Toggles results streaming from server. Driver will consume block-by-block of max_row_buffer size and yield row-by-row from each block.

set_types_check ( types_check ) В¶

Toggles type checking for sequence of INSERT parameters. Disabled by default.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

DB API 2.0В¶

This part of the documentation covers driver DB API.

clickhouse_driver.dbapi. connect ( dsn=None, user=None, password=None, host=None, port=None, database=None, **kwargs ) В¶

Create a new database connection.

The connection can be specified via DSN:

or using database and credentials arguments:

The basic connection parameters are:

See defaults in Connection constructor.

DSN or host is required.

Any other keyword parameter will be passed to the underlying Connection class.

Returns:a new connection.

exception clickhouse_driver.dbapi. Warning В¶ exception clickhouse_driver.dbapi. Error В¶ exception clickhouse_driver.dbapi. DataError В¶ exception clickhouse_driver.dbapi. DatabaseError В¶ exception clickhouse_driver.dbapi. ProgrammingError В¶ exception clickhouse_driver.dbapi. IntegrityError В¶ exception clickhouse_driver.dbapi. InterfaceError В¶ exception clickhouse_driver.dbapi. InternalError В¶ exception clickhouse_driver.dbapi. NotSupportedError В¶ exception clickhouse_driver.dbapi. OperationalError В¶

ConnectionВ¶

Creates new Connection for accessing ClickHouse database.

Connection is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.

Close the connection now. The connection will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the connection. The same applies to all cursor objects trying to use the connection.

Do nothing since ClickHouse has no transactions.

cursor ( ) В¶

Returns:a new Cursor Object using the connection.

rollback ( ) В¶

Do nothing since ClickHouse has no transactions.

CursorВ¶

Close the cursor now. The cursor will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the cursor.

Prepare and execute a database operation (query or command).

executemany ( operation, seq_of_parameters ) В¶

Fetch all (remaining) rows of a query result, returning them as a sequence of sequences (e.g. a list of tuples).

Returns:list of fetched rows.

fetchmany ( size=None ) В¶

Fetch the next set of rows of a query result, returning a sequence of sequences (e.g. a list of tuples). An empty sequence is returned when no more rows are available.

Parameters:size – amount of rows to return.
Returns:list of fetched rows or empty list.

fetchone ( ) В¶

Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.

Adds external table to cursor context.

If the same table is specified more than once the last one is used.

set_query_id ( query_id ) В¶

Specifies the query identifier for cursor.

Parameters:query_id – the query identifier.
Returns:None

set_settings ( settings ) В¶

Specifies settings for cursor.

Parameters:settings – dictionary of query settings
Returns:None

set_stream_results ( stream_results, max_row_buffer ) В¶

Toggles results streaming from server. Driver will consume block-by-block of max_row_buffer size and yield row-by-row from each block.

set_types_check ( types_check ) В¶

Toggles type checking for sequence of INSERT parameters. Disabled by default.

gavinln/clickhouse-test

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

This project provides a Ubuntu (20.04) Vagrant Virtual Machine (VM) with Clickhouse. Clickhouse is is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP).

There are Ansible scripts that automatically install the software when the VM is started.

Setup the machine

All the software installed exceeds the standard 10GB size of the virtual machine disk. Install the following plugin to resize the disk.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

QuickstartВ¶

This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.

A minimal working example looks like this:

This code will show all tables from ‘default’ database.

There are two conceptual types of queries:

Selecting dataВ¶

Simple select query looks like:

Of course queries can and should be parameterized to avoid SQL injections:

Percent symbols in inlined constants should be doubled if you mix constants with % symbol and %(x)s parameters.

Customisation SELECT output with FORMAT clause is not supported.

Selecting data with progress statisticsВ¶

Streaming resultsВ¶

When you are dealing with large datasets block by block results streaming may be useful:

Inserting dataВ¶

Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

To insert data efficiently, provide data separately, and end your statement with a VALUES clause:

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

The following WILL NOT work:

ClickHouse will execute this query like a usual SELECT query.

Inserting data in different formats with FORMAT clause is not supported.

See Inserting data from CSV file if you need to data in custom format.

DDL queries can be executed in the same way SELECT queries are executed:

Async and multithreadingВ¶

Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.

To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.

The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.

However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

Extremely slow on large select, http protocol almost 10 times faster #32

Comments

dmitriyshashkin commented Mar 20, 2018

It seems that selecting large datasets using the native client is extremely slow. Here is my benchmark https://gist.github.com/dmitriyshashkin/6a4849bdcf882ba340cdfbc1990da401

Initially, I’ve encountered this behavior on my own dataset, but I was able to reproduce it using the dataset and the structure described here https://clickhouse.yandex/docs/en/getting_started/example_datasets/ontime/

To simplify things a little bit I’ve used the data for just one month: http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_2017_12.zip

As you can see the fastest way to get the data is by using HTTP protocol with requests and pandas. The problem gets worse as the number of the rows grows, on my own dataset with 5M rows I waited for 1 hour before I had to interrupt the process. The bottleneck is not CH itself, the «top» command shows that all the work is done by python with 100% CPU utilization, while CH is almost idle.

The text was updated successfully, but these errors were encountered:

xzkostyan commented Mar 21, 2018

I haven’t tried to play with provided data yet. But here is the explanation of speed loss.

HTTP client returns plain text (csv) that should be parsed for example with pandas. Native client returns Python types.

Pandas is complied library (correct me if I’m not right), this driver written in pure Python (except compression and hashing libraries).

I’ll try to cythonize some bottlenecks in source code. The main bottleneck is results trasposition from columnar to row-like form.

If your data processing is OK with columnar form your can specify columnar=True parameter in execute call. This will give significant speedup.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

DevelopmentВ¶

Test configurationВ¶

In setup.cfg you can find ClickHouse server port, credentials, logging level and another options than can be tuned during local testing.

Running tests locallyВ¶

Install desired Python version with system package manager/pyenv/another manager.

Install test requirements and build package:

You should install cython if you want to change *.pyx files:

ClickHouse on host machineВ¶

Install desired versions of clickhouse-server and clickhouse-client on your machine.

ClickHouse in dockerВ¶

Create container desired version of clickhouse-server :

Create container with the same version of clickhouse-client :

Create clickhouse-client script on your host machine:

After it container test-clickhouse-client will communicate with test-clickhouse-server transparently from host machine.

Add entry in hosts file:

Set TZ=UTC and run tests:

GitHub Actions in forked repositoryВ¶

Workflows in forked repositories can be used for running tests.

Workflows don’t run in forked repositories by default. You must enable GitHub Actions in the Actions tab of the forked repository.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

InstallationВ¶

Python VersionВ¶

Clickhouse-driver supports Python 3.4 and newer and PyPy.

Build DependenciesВ¶

Example for python:alpine docker image:

By default there are wheels for Linux, Mac OS X and Windows.

Packages for Linux and Mac OS X are available for python: 3.4 – 3.9.

Packages for Windows are available for python: 3.5 – 3.9.

DependenciesВ¶

These distributions will be installed automatically when installing clickhouse-driver.

Optional dependenciesВ¶

These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.

Installation from PyPIВ¶

The package can be installed using pip :

You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:

You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:

NumPy supportВ¶

You can install additional packages (NumPy and Pandas) if you need NumPy support:

NumPy supported versions are limited by numpy package python support.

Installation from githubВ¶

Development version can be installed directly from github:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

PerformanceВ¶

This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.

clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.

When you read data over HTTP you may need to cast strings into Python types.

Test dataВ¶

Sample data for testing is taken from ClickHouse docs.

Create database and table:

Download some data for 2017 year:

Insert data into ClickHouse:

Required packagesВ¶

For fast json parsing we’ll use ujson package:

VersionsВ¶

Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]

BenchmarkingВ¶

Scripts below can be benchmarked with following one-liner:

Time will measure:

Plain text without parsingВ¶

Let’s take get plain text response from ClickHouse server as baseline.

Fetching not parsed data with pure requests (1)

Parsed rowsВ¶

Line split into elements will be consider as “parsed” for TSV format (2)

Now we cast each element to it’s data type (2.5)

JSONEachRow format can be loaded with json loads (3)

Get fully parsed rows with clickhouse-driver in Native format (4)

Iteration over rowsВ¶

Iteration over TSV (5)

Now we cast each element to it’s data type (5.5)

Iteration over JSONEachRow (6)

Iteration over rows with clickhouse-driver in Native format (7)

Iteration over string rowsВ¶

OK, but what if we need only string columns?

Iteration over TSV (8)

Iteration over JSONEachRow (9)

Iteration over string rows with clickhouse-driver in Native format (10)

Iteration over int rowsВ¶

Iteration over TSV (11)

Iteration over JSONEachRow (12)

Iteration over int rows with clickhouse-driver in Native format (13)

ResultsВ¶

This table contains memory and timing benchmark results of snippets above.

JSON in table is shorthand for JSONEachRow.

Rows
50k131k217k450k697k
Plain text without parsing: timing
Naive requests.get TSV (1)0.40 s0.67 s0.95 s1.67 s2.52 s
Naive requests.get JSON (1)0.61 s1.23 s2.09 s3.52 s5.20 s
Plain text without parsing: memory
Naive requests.get TSV (1)49 MB107 MB165 MB322 MB488 MB
Naive requests.get JSON (1)206 MB564 MB916 MB1.83 GB2.83 GB
Parsed rows: timing
requests.get TSV (2)0.81 s1.81 s3.09 s7.22 s11.87 s
requests.get TSV with cast (2.5)1.78 s4.58 s7.42 s16.12 s25.52 s
requests.get JSON (3)2.14 s5.65 s9.20 s20.43 s31.72 s
clickhouse-driver Native (4)0.73 s1.40 s2.08 s4.03 s6.20 s
Parsed rows: memory
requests.get TSV (2)171 MB462 MB753 MB1.51 GB2.33 GB
requests.get TSV with cast (2.5)135 MB356 MB576 MB1.15 GB1.78 GB
requests.get JSON (3)139 MB366 MB591 MB1.18 GB1.82 GB
clickhouse-driver Native (4)135 MB337 MB535 MB1.05 GB1.62 GB
Iteration over rows: timing
requests.get TSV (5)0.49 s0.99 s1.34 s2.58 s4.00 s
requests.get TSV with cast (5.5)1.38 s3.38 s5.40 s10.89 s16.59 s
requests.get JSON (6)1.89 s4.73 s7.63 s15.63 s24.60 s
clickhouse-driver Native (7)0.62 s1.28 s1.93 s3.68 s5.54 s
Iteration over rows: memory
requests.get TSV (5)19 MB19 MB19 MB19 MB19 MB
requests.get TSV with cast (5.5)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (6)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (7)56 MB70 MB71 MB71 MB71 MB
Iteration over string rows: timing
requests.get TSV (8)0.40 s0.67 s0.80 s1.55 s2.18 s
requests.get JSON (9)1.14 s2.64 s4.22 s8.48 s12.96 s
clickhouse-driver Native (10)0.46 s0.91 s1.35 s2.49 s3.67 s
Iteration over string rows: memory
requests.get TSV (8)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (9)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (10)46 MB56 MB57 MB57 MB57 MB
Iteration over int rows: timing
requests.get TSV (11)0.84 s2.06 s3.22 s6.27 s10.06 s
requests.get JSON (12)0.95 s2.15 s3.55 s6.93 s10.82 s
clickhouse-driver Native (13)0.43 s0.61 s0.86 s1.53 s2.27 s
Iteration over int rows: memory
requests.get TSV (11)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (12)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (13)41 MB48 MB48 MB48 MB49 MB

ConclusionВ¶

If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.

It doesn’t matter which interface to use if you manipulate small amount of rows.

clickhouse-driver

Python driver with native interface for ClickHouse

Package Health Score

Keep your project healthy

Check your requirements.txt

Snyk Vulnerability Scanner

Secure Your Project

Popularity

Total Weekly Downloads (293,670)

Direct Usage Popularity

The PyPI package clickhouse-driver receives a total of 293,670 downloads a week. As such, we scored clickhouse-driver popularity level to be Influential project.

Based on project statistics from the GitHub repository for the PyPI package clickhouse-driver, we found that it has been starred 900 times, and that 0 other projects in the ecosystem are dependent on it.

The download numbers shown are the average weekly downloads from the last 6 weeks.

Security

Python clickhouse driver. snyk poweredby. Python clickhouse driver фото. Python clickhouse driver-snyk poweredby. картинка Python clickhouse driver. картинка snyk poweredby. pip install clickhouse-connect Copy PIP instructions

Security and license risk for latest version

We found a way for you to contribute to the project! Looks like clickhouse-driver is missing a security policy.

You can connect your project’s repository to Snyk to stay up to date on security alerts and receive automatic fix pull requests.

Maintenance

Commit Frequency

Further analysis of the maintenance status of clickhouse-driver based on released PyPI versions cadence, the repository activity, and other data points determined that its maintenance is Healthy.

We found that clickhouse-driver demonstrates a positive version release cadence with at least one new version released in the past 3 months.

As a healthy sign for on-going project maintenance, we found that the GitHub repository had at least 1 pull request or issue interacted with by the community.

Community

With more than 10 contributors for the clickhouse-driver repository, this is possibly a sign for a growing and inviting community.

We found a way for you to contribute to the project! Looks like clickhouse-driver is missing a Code of Conduct.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

InstallationВ¶

Python VersionВ¶

Clickhouse-driver supports Python 3.4 and newer, Python 2.7, and PyPy.

Build DependenciesВ¶

Example for python:alpine docker image:

By default there are wheels for Linux, Mac OS X and Windows.

Packages for Linux and Mac OS X are available for python: 2.7, 3.4 – 3.9.

Packages for Windows are available for python: 2.7, 3.5 – 3.9.

DependenciesВ¶

These distributions will be installed automatically when installing clickhouse-driver.

Optional dependenciesВ¶

These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.

Installation from PyPIВ¶

The package can be installed using pip :

You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:

You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:

NumPy supportВ¶

You can install additional packages (NumPy and Pandas) if you need NumPy support:

NumPy supported versions are limited by numpy package python support.

Installation from githubВ¶

Development version can be installed directly from github:

ClickHouse/dbt-clickhouse

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Python clickhouse driver. chdbt. Python clickhouse driver фото. Python clickhouse driver-chdbt. картинка Python clickhouse driver. картинка chdbt. pip install clickhouse-connect Copy PIP instructions

Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions

This plugin ports dbt functionality to Clickhouse.

We do not test over older versions of Clickhouse. The plugin uses syntax that requires version 22.1 or newer.

Use your favorite Python package manager to install the app from PyPI, e.g.

OptionDescriptionRequired?
engineThe table engine (type of table) to use when creating tablesOptional (default: MergeTree() )
order_byA tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster.Optional (default: tuple() )
partition_byA partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns.Optional
unique_keyA tuple of column names that uniquely identify rows. For more details on uniqueness constraints, see here.Optional
inserts_onlyThis property is relevant only for incremental materialization. If set to True, incremental updates will be inserted directly to the target table without creating intermediate table. This option has the potential of significantly improve performance and avoid memory limitations on big updates.Optional
settingsA dictionary with custom settings for INSERT INTO and CREATE AS SELECT queries.Optional

Note: The only feature that is not supported and not tested is Ephemeral materialization.

Tests running command: pytest tests/integration

You can customize a few test params through environment variables. In order to provide custom params you’ll need to create test.env file under root (remember not to commit this file!) and define the following env variables inside:

ClickHouse wants to thank @silentsokolov for creating this connector and for their valuable contributions.

About

The Clickhouse plugin for dbt (data build tool)

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

ClickHouse Python Driver with native interface support

Related tags

Overview

ClickHouse Python Driver

ClickHouse Python Driver with native (TCP) interface support.

Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch

There are two ways to communicate with server:

Pure Client example:

ClickHouse Python Driver is distributed under the MIT license.

Issues

Fix null value on bytestring columns

When client setting strings_as_bytes is set, the driver crashes when inserting None values into columns with type Nullable([Fixed]String) :

fallback for os_name if user name is not defined

got this error while running inside docker container (no user entry for such uid)

Add max_partitions_per_insert_block to settings.available

The max_partitions_per_insert_block is defined in: https://github.com/yandex/ClickHouse/blob/f566182582c70986be19777b3583c803607928ad/dbms/src/Core/Settings.h#L315

How to speed up inserts from pandas dataframe?

I have pandas dataframe on my laptop with few millions of records. I am inserting them to clickhouse table with: client.execute(‘insert into database.table (col1, col2…, coln) values’, df.values.tolist())

After execution of this command I looked at laptop’s network activity.

Python clickhouse driver. 53157689 53f4f200 35c2 11e9 97ff 97e979445c3a. Python clickhouse driver фото. Python clickhouse driver-53157689 53f4f200 35c2 11e9 97ff 97e979445c3a. картинка Python clickhouse driver. картинка 53157689 53f4f200 35c2 11e9 97ff 97e979445c3a. pip install clickhouse-connect Copy PIP instructions

As you can see network activity is in peaks up to 12 Mbps, with lows at 6 Mbps. Such activity takes quite a long, and than at one moment, laptop’s network send goes to the 100 Mbps for some short period of time and insert is over.

Can someone explain how insert in clickhouse driver works? Why they are not going to the clickhouse server at top network speed?

I tried to play with settings like max_insert_block_size or insert_block_size, but with no success. A there any clickhouse server parameters that could improve the speed of inserts?

What would be the fastest way to insert pandas dataframe to clickhouse table?

Enum option parsing is not handling all supported characters correctly

When querying a table with Enum options containing comma and a space, the parsing of the options fails (see below).

With an example table as

the options are a bit non-standard but seem to be actually permitted. (Kind of created these options by accident due to a typo in a query and then figured out that the parsing could be improved on this.)

And while testing the the original parsing, I’ve also noticed that it is not really handling any empty characters before the first option; it gets prepended into the first option, e.g., Enum8( ‘one’ = 1, ‘exa»mple’ = 2, ‘three’ = 3) is turned into <" 'one": 1, 'exa"mple': 2, 'three': 3>which doesn’t seem right either.

Do you agree that it make sense to fix the options parsing? Should I add some tests for it?

In addition I’ve added escaping of single quotes into generated error message.

Get progress info

It seems the Progress packets are received and managed but there is no way to get the info from the Client or Connection objects. Here an API proposition with a fetch* method, this is common on a database API.

Last query’s profile info

Hi, is there an easy way to read

I have tried reading query profile info but I could only get

I am missing 3 more measurements

Question:

BTW1: I can process the system.query_log but this is a cumbersome approach BTW2: This could be a nice feature to have, i.e. adding an option to display profile info from Client.execute()

Wrong DateTime insert

After insert datetime.datetime(2018, 1, 19, 10) through this driver I see ‘2018-01-19 13:00:00’ value in table. Timezone on my computer and clickhouse server is Moskow.

What I must do to see ‘2018-01-19 10:00:00’ after insert?

Feature request: Extend columnar form to support NumPy / PyArrow arrays

As far as I understand there are two ways to do this, either turn python tuples into numpy arrays, if possible with zero copy, or do the transformation straight ahead on the binary data.

The bonus of this will be that another zero copy transformation of numpy arrays to pyarrow arrays can be easily done. This way we gain easily two significant advantages:

Use pyarrow for batch processing, table and pandas dataframe zero copy transformations, it’s blazing fast and memory efficient.

it opens the sky for arrow flight protocol (gRPC based) which can be great for transferring data with high speed from remote servers.

I would also like to use columnar forms with numpy arrays for my project and I am offering testing.

expected Hello or Exception, got Unknown packet

Describe the bug Client throws this error when running queries.

To Reproduce

Versions Python 3.9.6 clickhouse-driver built from commit 78e389e36d20744c236c546ee01ee76d5bc5fb35 Clickhouse server version 21.10.1 revision 54449

how to connect to remote clickhouse server

actually i think its a really useful tool, but the documentations are sooo poor. the introduction provides both client and connector examples, however all of these are toy examples. just as below:

client = Client(‘localhost’) conn = connect(‘clickhouse://localhost’)

how to build connection in real production env is not mentioned, how about remote situations? how to config clickhouse-server? which parameters are needed for client and connector API? None of these clearly provided. so i think this project is built from your daily work. but the project has got 0.5k stars, i got confused.

a better choice is https://github.com/ClickHouse/clickhouse-go

Boolen data type upload problem

Describe the bug Hi, I upload some data to clickhouse by clickhouse-driver. And my data types include «Boolen», the python script is run successfully. But the data in my database is not correct.

The error as below:

To Reproduce Minimal piece of Python code that reproduces the problem. CREATE TABLE IF NOT EXISTS paper ( has_inbound_citations Nullable(Bool), has_outbound_citations Nullable(Bool), ) engine = Memory

INSERT INTO paper has_inbound_citations,has_outbound_citations VALUES

Expected behavior A clear and concise description of what you expected to happen. It seems to be ‘true’ or ‘false’ in the database but given errors text.

Versions

python 3.10 clickhouse-driver 0.2.4 SELECT version()

Insert dataframe writes max datetime (2106-02-07) when its None in df

Describe the bug When I insert pandas dataframe with null/None/np.nan datetime columns, i got max datetime value in clickhouse (2106-02-07), although i need 1970-01-01.

To Reproduce

Expected behavior It should return 1970-01-01 instead of 2106-02-07

Versions

clickhouse-driver 0.2.4 clickhouse 22.3.9.19 python 3.9.7

Failed enum insert when types_check enabled

Describe the bug In the documentation it is written that supported types for enum inserts are: Enum, int, long, str/basestring. https://clickhouse-driver.readthedocs.io/en/latest/types.html#enum8-16 Executing the example from the documentation works fine, except when check_types is enabled, than it fails with the following msg:

To Reproduce

Expected behavior insert without error

Versions

query_dataframe returns empty dataframe with shape (0, 0) instead of returning shape (0, number of columns)

Describe the bug When the query returns 0 rows, function returns empty dataframe with the shape (0,0) without specifying any columns from query.

IMHO, in that case it should return the dataframe with shape (0, # of columns)

To Reproduce

Expected behavior

insert_dataframe fails with keyerror for nullable columns

Describe the bug If the clickhouse table has some nullable columns, then if we don’t add those columns in the data frame and try to upload it using clickhouse driver client.insert_dataframe, it fails with the keyerror.

To Reproduce

Expected behavior API should write to the clickhouse table by leaving nullable column with NULL values. Versions

Stacktrace: ` 2022-06-15T09:43:29.384183066Z stderr F raise KeyError(key) from err

How to speed up inserts from pandas dataframe? #76

Comments

ghuname commented Feb 21, 2019

I have pandas dataframe on my laptop with few millions of records. I am inserting them to clickhouse table with:
client.execute(‘insert into database.table (col1, col2…, coln) values’, df.values.tolist())

After execution of this command I looked at laptop’s network activity.

Python clickhouse driver. 53157689 53f4f200 35c2 11e9 97ff 97e979445c3a. Python clickhouse driver фото. Python clickhouse driver-53157689 53f4f200 35c2 11e9 97ff 97e979445c3a. картинка Python clickhouse driver. картинка 53157689 53f4f200 35c2 11e9 97ff 97e979445c3a. pip install clickhouse-connect Copy PIP instructions

As you can see network activity is in peaks up to 12 Mbps, with lows at 6 Mbps.
Such activity takes quite a long, and than at one moment, laptop’s network send goes to the 100 Mbps for some short period of time and insert is over.

Can someone explain how insert in clickhouse driver works?
Why they are not going to the clickhouse server at top network speed?

I tried to play with settings like max_insert_block_size or insert_block_size, but with no success.
A there any clickhouse server parameters that could improve the speed of inserts?

What would be the fastest way to insert pandas dataframe to clickhouse table?

The text was updated successfully, but these errors were encountered:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driver¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API Reference¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional Notes¶

Legal information, changelog and contributing are here for the interested.

clickhouse-http-client 1.0.2

pip install clickhouse-http-client Copy PIP instructions

Released: Jun 30, 2021

clickhouse http client, author liyuanjun

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT

Author: liyuanjun

Maintainers

Project description

clickhouse-http-client

clickhouse http client.

Install

Usage

Project details

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT

Author: liyuanjun

Maintainers

Python clickhouse driver. blue cube.e6165d35. Python clickhouse driver фото. Python clickhouse driver-blue cube.e6165d35. картинка Python clickhouse driver. картинка blue cube.e6165d35. pip install clickhouse-connect Copy PIP instructions

Download files

Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

ppodolsky/clickhouse-python

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Models are defined in a way reminiscent of Django’s ORM:

The main object you are interacting with is Database:

Topology is just a special object wrapping hosts that also introduces priorities of hosts. How to prepare a topology you can read in the next section. If neccessary, you can specify credentials:

ClickHouse is optimized for a bulk insert, and we’ve implemented embedded buffering here to avoid single inserts. Every model (table) has its own buffer and buffer size defines how many instances of the model must be collected in buffer before real insert. If you need more predictable inserts, you can always use db.flush() which sends all collected instances immediately or even set buffer_size=0 to flush every insert. Buffering are disabled by default, for using it you must set an appropriate buffer_size:

The rule of thumbs to choose buffer size is to set such a size that buffer would overflow every second. Database client can be thread-safe. To get thread-safety use threaded=True while creating Database object. You can create a separate thread to flush every second or insert in multiple threads.

Describing topology of ClickHouse cluster

This wrapper tends to support multi DC strategies. Topology can be described in the following format:

where keys in the dictionary are priorities of corresponding host’s lists, lesser values means higher priority. In the topology above requests will be always sent to any of host1, host2, host3 (choosen randomly every time). Hosts with priority 2 will be involved in action only if all hosts with priority 1 fall down.

Assuming there are two data centers DC-1 and DC-2 and code is running on a host in DC-1

There is a helper to produce topology in the required format from a more human readable format. Code below produces the same result as above:

ClickHouse and Python: Jupyter Notebooks

Jupyter Notebooks are an indispensable tool for sharing code between users in Python data science. For those unfamiliar with them, notebooks are documents that contain runnable code snippets mixed with documentation. They can invoke Python libraries for numerical processing, machine learning, and visualization. The code output includes not just text output but also graphs from powerful libraries like matplotlib and seaborn. Notebooks are so ubiquitous that it’s hard to think of manipulating data in Python without them.

ClickHouse support for Jupyter Notebooks is excellent. I have spent the last several weeks playing around with Jupyter Notebooks using two community drivers: clickhouse-driver and clickhouse-sqlalchemy. The results are now published on Github at https://github.com/Altinity/clickhouse-python-examples. The remainder of this blog contains tips to help you integrate ClickHouse data to your notebooks.

Driver Installation

You can run Jupyter Notebooks directly from the command line but like most people I run them using Anaconda. We’ll assume you know how to run Jupyter from Anaconda Navigator. (If not, read the Anaconda docs and come back.) To use the ClickHouse drivers you’ll want to run conda commands similar to the following to bring them into your environment. This example uses the ‘base’ environment.

Now when you start Jupyter with the ‘base’ environment you’ll have ClickHouse drivers available for import. Tip: you can run these commands to load modules while Jupyter is already running. I do this regularly to top up missing libraries.

There are other Python drivers available such as the sqlalchemy-clickhouse driver developed by Marek Vavrusa and others. However, the drivers shown above are available on conda-forge which makes them easy to use with Anaconda.

So much for installation. Let’s put the drivers to use.

Shortest Path to Data

The easiest way to work on data from ClickHouse is via the SQLAlchemy %sql magic function. There is a sample notebook that shows how to do this easily. For now let’s step through the recipe since this likely to be the most common way many users access data from ClickHouse.

First, let’s load SQLAlchemy and enable the %sql function.

Next, let’s connect to ClickHouse and fetch data from the famous Iris data set into a pandas data frame. The last command shows the end of the frame so we can confirm it has data.

Finally, let’s create a nice scatter graph with some of the data. This code is the most complex by far but generates a nice picture showing the overlap between characteristics of the three Iris species.

The result is the very satisfactory graph shown below.

Python clickhouse driver. db938 iris scatter graph. Python clickhouse driver фото. Python clickhouse driver-db938 iris scatter graph. картинка Python clickhouse driver. картинка db938 iris scatter graph. pip install clickhouse-connect Copy PIP instructions

For more details and to run the sample yourself check out the source notebook file.

Translating Data Types

One of the issues you’ll need to watch for in your own work is ensuring that pandas data frames have correct data types, especially numbers. If your SQL schema sticks with ints and floats, values will convert easily in result sets. More specialized types like Decimal do not automatically convert to numeric types, which means that libraries like matplotlib and scikit-learn won’t be able to use them correctly. Here’s an example of properly conforming DDL for the iris table:

It’s a good idea to run DataFrame.describe() on data frames created from SQL to ensure you got it right and that values have the expected types.

The key thing to check for is that numeric columns are really numbers and not ‘object’ or ‘str’ values. You’ll of course notice problems with as soon as you try to put values in a graph or feed them to numerical libraries. For example, Matplotlib does not correctly plot X and Y axes for non-numeric data. That said, the root cause can be confusing to diagnose if you have not see it before.

Pandas has methods that allow you to patch up mismatched types but it’s easier to get things right in the schema to begin with.

Direct Use of ClickHouse Drivers

The %sql function is great if you are just accessing data and need to get it into a data frame. But what if you want to do more than just look at query results? %sql cannot run DDL or insert values. In this case you can import clickhouse-driver and clickhouse-alchemy entities and call them directly from notebook code. Here’s a trivial example:

We documented use of the clickhouse-driver in depth in a previous Altinity blog article. You can look there for a general overview of the driver. The EX-1.0-Getting-to-Know-the-Clickhouse-driver-Client.ipynb notebook contains samples showing how to run DDL, select data, and load CSV.

Use of the clickhouse-sqlalchemy driver is illustrated in the EX-2-ClickHouse-SQL-Alchemy.ipynb notebook. We have not done a full review on the driver but based on initial experience it seems to work as well as the clickhouse-driver module, on which it depends. The main committer is Konstantin Lebedev (@xzkostyan), who also developed clickhouse-driver. You can also look at the documentation in the Github project. Between the notebook samples and the project README users who have previously used SQLAlchemy should have little problem undertstanding it.

Relatively few problems popped up during notebook development. I have not run into driver operations that work elsewhere but fail in Jupyter. Driver behavior in Jupyter appears 100% equivalent to running Python3 from the command line. We expect this of course but it’s still good when it happens. The most interesting problems so far were related to data conversions, which are a typical integration issue.

Lessons from Jupyter and ClickHouse

There is a natural symbiosis between ClickHouse and Python libraries like pandas and scikit-learn. Notebooks are very helpful for exploring the relationship in a systematic way.

Over the last few weeks I have noticed ways to combine capabilities from both sides effectively. Here are two simple examples that popped up relating to pandas data frames.

Going from SQL to Pandas. Data frames can manipulate data in ways that are difficult to do in ClickHouse. For example, you select normalized array data from ClickHouse data frame, then use the DataFrame.pivot_table() method to pivot rows and columns. See the EX-4-Pivot-Using-SQL-And-Pandas.ipynb for an example of how to do this.

Going from Pandas to SQL. I documented CSV loading in the clickhouse-driver using the csv.DictReader in my last blog article. It turns out that Pandas has a much better CSV reader than the native Python csv module. Among other things it converts numeric types automatically. This is now part of the clickhouse-driver notebook.

I’m sure there are many other ways to use Jupyter Notebook creatively with ClickHouse. If you have additional samples or see problems with those already there, please submit a PR on Github. Having a centrally located library of nice Python samples for ClickHouse will help all users.

whisklabs/airflow-clickhouse-plugin

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Airflow ClickHouse Plugin

Installation and dependencies

Requires apache-airflow and clickhouse-driver (installed automatically by pip ). Primarily supports Airflow 2.0–2.3. Later versions are expected to work properly but may be not fully tested. Use plugin versions below 0.6.0 (e.g. 0.5.7.post1) to preserve compatibility with Airflow 1.10.6 (this version has long-term support on Google Cloud Composer).

Note on pandas dependency

To import ClickHouseOperator use: from airflow_clickhouse_plugin.operators.clickhouse_operator import ClickHouseOperator

The result of the last query is pushed to XCom.

To import ClickHouseHook use: from airflow_clickhouse_plugin.hooks.clickhouse_hook import ClickHouseHook

Supported kwargs of constructor ( __init__ method):

Supports all the methods of the Airflow BaseHook including:

Sensor fully inherits from Airflow SQLSensor and therefore fully implements its interface using ClickHouseHook to fetch the SQL execution result and supports templating of sql argument.

How to create an Airflow connection to ClickHouse

As a type of a new connection, choose SQLite. host should be set to ClickHouse host’s IP or domain name.

There is no special ClickHouse connection type yet, so we use SQLite as the closest one.

If you use a secure connection to ClickHouse (this requires additional configurations on ClickHouse side), set extra to <"secure":true>.

ClickHouse Connection schema

clickhouse_driver.Client is initialized with attributes stored in Airflow Connection attributes. The mapping of the attributes is listed below:

Airflow Connection attributeClient.__init__ argument
hosthost
portport
schemadatabase
loginuser
passwordpassword
extra**kwargs

database argument of ClickHouseOperator or ClickHouseHook overrides schema attribute of the Airflow connection.

For example, if Airflow connection contains extra= <"secure":true>then the Client.__init__ will receive secure=True keyword argument in addition to other non-empty connection attributes.

If the Airflow connection attribute is not set then it is not passed to the Client at all. In that case the default value of the corresponding clickhouse_driver.Connection argument is used (e.g. user defaults to ‘default’ ).

This means that Airflow ClickHouse Plugin does not itself define any default values for the ClickHouse connection. You may fully rely on default values of the clickhouse-driver version you use. The only exception is host : if the attribute of Airflow connection is not set then ‘localhost’ is used.

Important note: don’t try to insert values using ch_hook.run(‘INSERT INTO some_ch_table VALUES (1)’) literal form. clickhouse-driver requires values for INSERT query to be provided via parameters due to specifics of the native ClickHouse protocol.

How to run tests

Integration tests require access to ClickHouse server. Tests use connection URI defined via environment variable AIRFLOW_CONN_CLICKHOUSE_DEFAULT with clickhouse://localhost as default.

Github Action is set up for this project.

Run tests using Docker

Run ClickHouse server inside Docker:

The above command will open bash inside the container.

Install dependencies into container and run tests (execute inside container):

About

Airflow ClickHouse Plugin based on clickhouse-driver

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

How to connect to ClickHouse with Python using SQLAlchemy

Python clickhouse driver. 1*pB5ud9SWXVK1cYN1FwODOQ. Python clickhouse driver фото. Python clickhouse driver-1*pB5ud9SWXVK1cYN1FwODOQ. картинка Python clickhouse driver. картинка 1*pB5ud9SWXVK1cYN1FwODOQ. pip install clickhouse-connect Copy PIP instructions

Introduction

ClickHouse is one of the fastest opensource databases in the market and it claims to be faster than Spark. At WhiteBox we’ve tested this hypothesis with a +2 billion rows table and we can assure you it is! Our tests performed 3x faster for a complex aggregation with several filters.

Regarding this tutorial, all code and steps in this post has been tested in May 2021 and Ubuntu 20.04 OS, so please don’t be evil and don’t complain if the code does not work in September 2025 😅.

Requirements

The requirements for this integration are the following:

ClickHouse server: It can be installed quite easily following the official documentation. Current version (21.4.5.46).

Setup

ClickHouse installation

This tutorial can be tested against any ClickHouse database. However, in order to get a local ClickHouse database to test the integration, it can be easily installed following the steps below:

Running command “clickhouse-client” on the shell ensure you that your ClickHouse installation is properly working. Besides, it can help you debug the SQLAlchemy DDL.

Python environment

These are the Python libraries that are required to run the all the code in this tutorial:

Integration

SQLAlchemy setup

The following lines of code perform the SQLAlchemy standard connection:

Create a new database

It is possible to test the current databases in ClickHouse from the command line connection using the command “SHOW DATABASES”. The following output should display on screen:

Python clickhouse driver. 1*sT7m5k RKsV6e6soje6tug. Python clickhouse driver фото. Python clickhouse driver-1*sT7m5k RKsV6e6soje6tug. картинка Python clickhouse driver. картинка 1*sT7m5k RKsV6e6soje6tug. pip install clickhouse-connect Copy PIP instructions

Create a new table

The following steps show how to create a MergeTree engine table in ClickHouse using the SQLAlchemy ORM model.

ORM model definition

A new table should appear in the new database:

Python clickhouse driver. 1*L 0QM4NXij9aZ5tSe0PYqA. Python clickhouse driver фото. Python clickhouse driver-1*L 0QM4NXij9aZ5tSe0PYqA. картинка Python clickhouse driver. картинка 1*L 0QM4NXij9aZ5tSe0PYqA. pip install clickhouse-connect Copy PIP instructions

INSERT

SELECT

Conclusions

Should ClickHouse replace traditional databases like Postgres, MySQL, Oracle? Definitively no. These databases have a lot of features that ClickHouse doesn’t currently have nor it is intended to have in the future (primary key basic concepts, unique columns…). It can be considered an analytics database but not a fully functioning transactional one.

However, ClickHouse speed is so amazing that it should be definitively the GOTO when there is a huge amount of tabular data.

mymarilyn/aioch

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

aioch is a library for accessing a ClickHouse database over native interface from the asyncio. It wraps features of clickhouse-driver for asynchronous usage.

Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. aioch. Python clickhouse driver фото. Python clickhouse driver-aioch. картинка Python clickhouse driver. картинка aioch. pip install clickhouse-connect Copy PIP instructions

The package can be installed using pip :

To install from source:

For more information see clickhouse-driver usage examples.

Other parameters are passing to wrapped clickhouse-driver’s Client.

aioch is distributed under the MIT license.

About

romario076/ClickHouseConnector

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Create clickhouse_driver.pandasConnector module, it is ClickHouse connector for python using pandas.

Using your query, module returns pandas DataFrame.

About

ClickHouse connector for python using pandas

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Footer

© 2022 GitHub, Inc.

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

dbt-clickhouse 1.1.7

pip install dbt-clickhouse Copy PIP instructions

Released: Jul 11, 2022

The Clickhouse plugin for dbt (data build tool)

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: Apache Software License (MIT)

Requires: Python >=3.7

Maintainers

Classifiers

Project description

dbt-clickhouse

This plugin ports dbt functionality to Clickhouse.

We do not test over older versions of Clickhouse. The plugin uses syntax that requires version 22.1 or newer.

Installation

Use your favorite Python package manager to install the app from PyPI, e.g.

Supported features

Usage Notes

Database

Model Configuration

OptionDescriptionRequired?
engineThe table engine (type of table) to use when creating tablesOptional (default: MergeTree() )
order_byA tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster.Optional (default: tuple() )
partition_byA partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns.Optional
unique_keyA tuple of column names that uniquely identify rows. For more details on uniqueness constraints, see here.Optional
inserts_onlyThis property is relevant only for incremental materialization. If set to True, incremental updates will be inserted directly to the target table without creating intermediate table. This option has the potential of significantly improve performance and avoid memory limitations on big updates.Optional
settingsA dictionary with custom settings for INSERT INTO and CREATE AS SELECT queries.Optional

Example Profile

Running Tests

Note: The only feature that is not supported and not tested is Ephemeral materialization.

Tests running command: pytest tests/integration

You can customize a few test params through environment variables. In order to provide custom params you’ll need to create test.env file under root (remember not to commit this file!) and define the following env variables inside:

Original Author

ClickHouse wants to thank @silentsokolov for creating this connector and for their valuable contributions.

clickhouse-client-pool 0.0.2

pip install clickhouse-client-pool Copy PIP instructions

Released: Mar 28, 2021

No project description provided

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Author: Eric Wang

Maintainer: Eric Wang

Maintainers

Classifiers

Project description

Table of Contents

Intro

A Naive Thread Safe clickhouse-client-pool based on clickhouse_driver.

Installation

clickhouse-client-pool is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 2.7/3.6+.

Installation

License

clickhouse-client-pool is distributed under the terms of

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

PerformanceВ¶

This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.

clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.

When you read data over HTTP you may need to cast strings into Python types.

Test dataВ¶

Sample data for testing is taken from ClickHouse docs.

Create database and table:

Download some data for 2017 year:

Insert data into ClickHouse:

Required packagesВ¶

For fast json parsing we’ll use ujson package:

VersionsВ¶

Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]

BenchmarkingВ¶

Scripts below can be benchmarked with following one-liner:

Time will measure:

Plain text without parsingВ¶

Let’s take get plain text response from ClickHouse server as baseline.

Fetching not parsed data with pure requests (1)

Parsed rowsВ¶

Line split into elements will be consider as “parsed” for TSV format (2)

Now we cast each element to it’s data type (2.5)

JSONEachRow format can be loaded with json loads (3)

Get fully parsed rows with clickhouse-driver in Native format (4)

Iteration over rowsВ¶

Iteration over TSV (5)

Now we cast each element to it’s data type (5.5)

Iteration over JSONEachRow (6)

Iteration over rows with clickhouse-driver in Native format (7)

Iteration over string rowsВ¶

OK, but what if we need only string columns?

Iteration over TSV (8)

Iteration over JSONEachRow (9)

Iteration over string rows with clickhouse-driver in Native format (10)

Iteration over int rowsВ¶

Iteration over TSV (11)

Iteration over JSONEachRow (12)

Iteration over int rows with clickhouse-driver in Native format (13)

ResultsВ¶

This table contains memory and timing benchmark results of snippets above.

JSON in table is shorthand for JSONEachRow.

Rows
50k131k217k450k697k
Plain text without parsing: timing
Naive requests.get TSV (1)0.40 s0.67 s0.95 s1.67 s2.52 s
Naive requests.get JSON (1)0.61 s1.23 s2.09 s3.52 s5.20 s
Plain text without parsing: memory
Naive requests.get TSV (1)49 MB107 MB165 MB322 MB488 MB
Naive requests.get JSON (1)206 MB564 MB916 MB1.83 GB2.83 GB
Parsed rows: timing
requests.get TSV (2)0.81 s1.81 s3.09 s7.22 s11.87 s
requests.get TSV with cast (2.5)1.78 s4.58 s7.42 s16.12 s25.52 s
requests.get JSON (3)2.14 s5.65 s9.20 s20.43 s31.72 s
clickhouse-driver Native (4)0.73 s1.40 s2.08 s4.03 s6.20 s
Parsed rows: memory
requests.get TSV (2)171 MB462 MB753 MB1.51 GB2.33 GB
requests.get TSV with cast (2.5)135 MB356 MB576 MB1.15 GB1.78 GB
requests.get JSON (3)139 MB366 MB591 MB1.18 GB1.82 GB
clickhouse-driver Native (4)135 MB337 MB535 MB1.05 GB1.62 GB
Iteration over rows: timing
requests.get TSV (5)0.49 s0.99 s1.34 s2.58 s4.00 s
requests.get TSV with cast (5.5)1.38 s3.38 s5.40 s10.89 s16.59 s
requests.get JSON (6)1.89 s4.73 s7.63 s15.63 s24.60 s
clickhouse-driver Native (7)0.62 s1.28 s1.93 s3.68 s5.54 s
Iteration over rows: memory
requests.get TSV (5)19 MB19 MB19 MB19 MB19 MB
requests.get TSV with cast (5.5)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (6)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (7)56 MB70 MB71 MB71 MB71 MB
Iteration over string rows: timing
requests.get TSV (8)0.40 s0.67 s0.80 s1.55 s2.18 s
requests.get JSON (9)1.14 s2.64 s4.22 s8.48 s12.96 s
clickhouse-driver Native (10)0.46 s0.91 s1.35 s2.49 s3.67 s
Iteration over string rows: memory
requests.get TSV (8)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (9)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (10)46 MB56 MB57 MB57 MB57 MB
Iteration over int rows: timing
requests.get TSV (11)0.84 s2.06 s3.22 s6.27 s10.06 s
requests.get JSON (12)0.95 s2.15 s3.55 s6.93 s10.82 s
clickhouse-driver Native (13)0.43 s0.61 s0.86 s1.53 s2.27 s
Iteration over int rows: memory
requests.get TSV (11)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (12)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (13)41 MB48 MB48 MB48 MB49 MB

ConclusionВ¶

If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.

It doesn’t matter which interface to use if you manipulate small amount of rows.

Блог об аналитике, визуализации данных, data science и BI

Дашборд первых 8 месяцев жизни малыша

Анализ рынка вакансий аналитики и BI: дашборд в Tableau

Анализ альбомов Земфиры: дашборд в Tableau

Гайд по современным BI-системам

Создаём материализованное представление в Clickhouse

Настройка машины
Наш скрипт на Python из предыдущих материалов необходимо подключить к Clickhouse — он будет отправлять запросы, поэтому нужно открыть несколько портов. В Dashboard AWS переходим в Network & Security — Security Groups. Наша машина входит в группу launch-wizard-1. Переходим в неё и смотрим на Inbound rules: нам нужно добавить правила как на скриншоте.

Python clickhouse driver. 1 2. Python clickhouse driver фото. Python clickhouse driver-1 2. картинка Python clickhouse driver. картинка 1 2. pip install clickhouse-connect Copy PIP instructions

Настройка Clickhouse
Теперь настроим Clickhouse. Отредактируем файл config.xml в редакторе nano:

Воспользуйтесь мануалом по горячим клавишам, если тоже не сразу поняли, как выйти из nano.

чтобы доступ к базе данных был с любого IP-адреса:

Python clickhouse driver. 2 1. Python clickhouse driver фото. Python clickhouse driver-2 1. картинка Python clickhouse driver. картинка 2 1. pip install clickhouse-connect Copy PIP instructions

Создание таблицы и материализованного представления
Зайдём в клиент и создадим нашу базу данных, в которой впоследствии создадим таблицы:

Мы проиллюстрируем всё тот же пример сбора данных с Facebook. Информация по кампаниям может часто обновляться, и мы, в целях упражнения, хотим создать материализованное представление, которое будет автоматически пересчитывать агрегаты на основе собранных данных по затратам. Таблица в Clickhouse будет практически такой же, как DataFrame из прошлого материала. В качестве движка таблицы используем ReplacingMergeTree : он будет удалять дубликаты по ключу сортировки:

И сразу создадим материализованное представление:

Подробности рецепта можно посмотреть в блоге Clickhouse.

Скрипт
Начнём писать скрипт. Понадобится новая библиотека — clickhouse_driver, позволяющая отправлять запросы к Clickhouse из скрипта на Python:

В материале приведена только доработка скрипта, описанного в статье «Собираем данные по рекламным кампаниям в Facebook». Всё будет работать, если вы просто вставите код из текущего материала в скрипт предыдущего.

Чтобы удостовериться, что всё нормально, можно написать следующий запрос, который должен вывести наименования всех баз данных на сервере:

В случае успеха получим на экране такой список:

Пусть, например, мы хотим рассматривать данные за последние три дня. Получим эти даты библиотекой datetime и переведём в нужный формат методом strftime() :

Напишем вот такой запрос, получающий все колонки таблицы за это время:

Python clickhouse driver. 3 2. Python clickhouse driver фото. Python clickhouse driver-3 2. картинка Python clickhouse driver. картинка 3 2. pip install clickhouse-connect Copy PIP instructions

Python clickhouse driver. 4 1. Python clickhouse driver фото. Python clickhouse driver-4 1. картинка Python clickhouse driver. картинка 4 1. pip install clickhouse-connect Copy PIP instructions

hatarist/clickhouse-cli

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

An unofficial command-line client for the ClickHouse DBMS. It implements some common and awesome things, such as:

But it works over the HTTP port, so there are some limitations for now:

Python 3.4+ is required.

/.clickhouse-cli.rc is here for your service!

The available environment variables are:

The order of precedence is:

Reading from file / stdin

Inserting the data from file

Oh boy. It’s a very dirty (and very untested) hack that lets you define your own functions or, actually, whatever you want, by running a find & replace operation over the query before sending the query to the server.

Say, you often run queries that parse some JSON, so you use visitParamExtractString all the time:

About

A third-party client for the Clickhouse DBMS server.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

PerformanceВ¶

This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.

clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.

When you read data over HTTP you may need to cast strings into Python types.

Test dataВ¶

Sample data for testing is taken from ClickHouse docs.

Create database and table:

Download some data for 2017 year:

Insert data into ClickHouse:

Required packagesВ¶

For fast json parsing we’ll use ujson package:

VersionsВ¶

Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]

BenchmarkingВ¶

Scripts below can be benchmarked with following one-liner:

Time will measure:

Plain text without parsingВ¶

Let’s take get plain text response from ClickHouse server as baseline.

Fetching not parsed data with pure requests (1)

Parsed rowsВ¶

Line split into elements will be consider as “parsed” for TSV format (2)

Now we cast each element to it’s data type (2.5)

JSONEachRow format can be loaded with json loads (3)

Get fully parsed rows with clickhouse-driver in Native format (4)

Iteration over rowsВ¶

Iteration over TSV (5)

Now we cast each element to it’s data type (5.5)

Iteration over JSONEachRow (6)

Iteration over rows with clickhouse-driver in Native format (7)

Iteration over string rowsВ¶

OK, but what if we need only string columns?

Iteration over TSV (8)

Iteration over JSONEachRow (9)

Iteration over string rows with clickhouse-driver in Native format (10)

Iteration over int rowsВ¶

Iteration over TSV (11)

Iteration over JSONEachRow (12)

Iteration over int rows with clickhouse-driver in Native format (13)

ResultsВ¶

This table contains memory and timing benchmark results of snippets above.

JSON in table is shorthand for JSONEachRow.

Rows
50k131k217k450k697k
Plain text without parsing: timing
Naive requests.get TSV (1)0.40 s0.67 s0.95 s1.67 s2.52 s
Naive requests.get JSON (1)0.61 s1.23 s2.09 s3.52 s5.20 s
Plain text without parsing: memory
Naive requests.get TSV (1)49 MB107 MB165 MB322 MB488 MB
Naive requests.get JSON (1)206 MB564 MB916 MB1.83 GB2.83 GB
Parsed rows: timing
requests.get TSV (2)0.81 s1.81 s3.09 s7.22 s11.87 s
requests.get TSV with cast (2.5)1.78 s4.58 s7.42 s16.12 s25.52 s
requests.get JSON (3)2.14 s5.65 s9.20 s20.43 s31.72 s
clickhouse-driver Native (4)0.73 s1.40 s2.08 s4.03 s6.20 s
Parsed rows: memory
requests.get TSV (2)171 MB462 MB753 MB1.51 GB2.33 GB
requests.get TSV with cast (2.5)135 MB356 MB576 MB1.15 GB1.78 GB
requests.get JSON (3)139 MB366 MB591 MB1.18 GB1.82 GB
clickhouse-driver Native (4)135 MB337 MB535 MB1.05 GB1.62 GB
Iteration over rows: timing
requests.get TSV (5)0.49 s0.99 s1.34 s2.58 s4.00 s
requests.get TSV with cast (5.5)1.38 s3.38 s5.40 s10.89 s16.59 s
requests.get JSON (6)1.89 s4.73 s7.63 s15.63 s24.60 s
clickhouse-driver Native (7)0.62 s1.28 s1.93 s3.68 s5.54 s
Iteration over rows: memory
requests.get TSV (5)19 MB19 MB19 MB19 MB19 MB
requests.get TSV with cast (5.5)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (6)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (7)56 MB70 MB71 MB71 MB71 MB
Iteration over string rows: timing
requests.get TSV (8)0.40 s0.67 s0.80 s1.55 s2.18 s
requests.get JSON (9)1.14 s2.64 s4.22 s8.48 s12.96 s
clickhouse-driver Native (10)0.46 s0.91 s1.35 s2.49 s3.67 s
Iteration over string rows: memory
requests.get TSV (8)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (9)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (10)46 MB56 MB57 MB57 MB57 MB
Iteration over int rows: timing
requests.get TSV (11)0.84 s2.06 s3.22 s6.27 s10.06 s
requests.get JSON (12)0.95 s2.15 s3.55 s6.93 s10.82 s
clickhouse-driver Native (13)0.43 s0.61 s0.86 s1.53 s2.27 s
Iteration over int rows: memory
requests.get TSV (11)19 MB19 MB19 MB19 MB19 MB
requests.get JSON (12)20 MB20 MB20 MB20 MB20 MB
clickhouse-driver Native (13)41 MB48 MB48 MB48 MB49 MB

ConclusionВ¶

If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.

It doesn’t matter which interface to use if you manipulate small amount of rows.

clickhouse-repl 1.0.0

pip install clickhouse-repl Copy PIP instructions

Released: Jan 19, 2021

A toolkit for running ClickHouse queries interactively, leveraging the perks of an ipython console

Navigation

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Author: klic.tools

Requires: Python >=3.7, wesleybatista

Classifiers

Project description

clickhouse-repl

A toolkit for running ClickHouse queries interactively, leveraging the perks of an ipython console

Installation

Use the package manager pip to install clickhouse-repl.

Usage

Connecting

Password prompted

If no environment variable is set, password will be prompted.

Password provided

Avoid this one!

Depending on the shell and settings in place, it is possible to bypass the recording to history by prefixing the command with double space

Password from Environment Variable

Connecting to specific database

Specify the database name and your session will start automatically from it.

Useful when your tables are somewhere else other than the ClickHouse default’s database and you don’t want to specify the database every time on your queries.

Running Queries

Using run_queries

Using client / c

These are shortcuts to clickhouse_driver.Client instance, initiated when a clickhouse-repl session is started.

You may use it for whatever purpose you may find.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

clickhouse-driver-fork-0-2-4 0.0.2

pip install clickhouse-driver-fork-0-2-4 Copy PIP instructions

Released: Aug 23, 2022

Fix of the version 0.2.4, for the clickhouse’s version 22.3

Navigation

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (This is the MIT license: http://www.opensource.org/licenses/mit-license.php Copyright (c) 2017 by Konstantin Lebedev. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the «Software»), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED «AS IS», WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.)

Maintainer: «>Carlos Yago

Requires: Python >=3.7

Maintainers

Classifiers

Project description

ClickHouse Python Driver

ClickHouse Python Driver with native (TCP) interface support.

Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch

Features

Documentation

Usage

There are two ways to communicate with server:

Pure Client example:

License

ClickHouse Python Driver is distributed under the MIT license.

Project details

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (This is the MIT license: http://www.opensource.org/licenses/mit-license.php Copyright (c) 2017 by Konstantin Lebedev. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the «Software»), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED «AS IS», WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.)

Maintainer: «>Carlos Yago

clickhouse_driver.errors.SocketTimeoutError #84

Comments

86085185 commented Apr 14, 2019

code:
from clickhouse_driver import Client
client = Client(‘xx.xxx.xx.xx’,port=8123,database=’default’,user=’default’,password=»)
client.execute(‘SHOW tables’)

error:
Traceback (most recent call last):
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/connection.py», line 232, in connect
self.receive_hello()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/connection.py», line 307, in receive_hello
packet_type = read_varint(self.fin)
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/reader.py», line 30, in read_varint
i = f.read_one()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/bufferedreader.py», line 48, in read_one
self.read_into_buffer()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/bufferedreader.py», line 143, in read_into_buffer
self.current_buffer_size = self.sock.recv_into(self.buffer)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File «», line 1, in
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/client.py», line 191, in execute
self.connection.force_connect()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/connection.py», line 166, in force_connect
self.connect()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/connection.py», line 240, in connect
‘<> (<>)’.format(e.strerror, self.get_description())
clickhouse_driver.errors.SocketTimeoutError: Code: 209. None (39.96.218.165:8123)

check:
curl xx.xxx.xx.xx:8123
ok
jdbc connect
ok
datagrip connect
ok

The text was updated successfully, but these errors were encountered:

xzkostyan commented Apr 14, 2019

This driver use native protocol (port 9000). Port 8123 is used for HTTP protocol. Use 9000 port.

Python clickhouse driver

Table of Contents

Utility to import data into ClickHouse from MySQL (mainly) and/or CSV files

Requirements and Installation

Datareader can be installed either from github repo or from pypi repo.

Install dependencies. MySQL repo (for mysql-community-devel )

epel (for python3 )

clickhouse-client (for clickhouse-client ) from Packagecloud repo from packagecloud.io More details on installation are available on https://github.com/Altinity/clickhouse-rpm-install

and direct dependencies:

Install data reader

In case you’d like to play around with the sources this is the way to go.

MySQLdb package is used for communication with MySQL:

mysql-replication package is used for communication with MySQL also: https://github.com/noplay/python-mysql-replication

clickhouse-driver package is used for communication with ClickHouse: https://github.com/mymarilyn/clickhouse-driver

Clone sources from github

Also the following MySQL config options are required:

Expected results are:

Requirements and Limitations

Data reader understands INSERT SQL statements only. In practice this means that:

Operation General Schema

pypy significantly improves performance. You should try it. Really. Up to 10 times performance boost can be achieved. For example you can start with Portable PyPy distribution for Linux

Install required modules

mysqlclient may require to install libmysqlclient-dev and gcc

Install them if need be

Now you can run data reader via pypy

Let’s walk over test example of tool launch command line options. This code snippet is taken from shell script (see more details in airline.ontime Test Case)

MySQL is already configured as described earlier. Let’s migrate existing data to ClickHouse and listen for newly coming data in order to migrate them to CLickHouse on-the-fly.

Create ClickHouse table description

We have CREATE TABLE template stored in create_clickhouse_table_template.sql file.

Setup sharding field and primary key. These columns must not be Nullable

Create table in ClickHouse

Lock MySQL in order to avoid new data coming while data migration is running. Keep mysql client open during the whole process

This may take some time. Check all data is in ClickHouse

Start clickhouse-mysql as a replication slave, so it will listen for new data coming:

Replication will be pumping data from MySQL into ClickHouse in background and in some time we’ll see the following picture in ClickHouse:

Prepare tables templates in create_clickhouse.sql file

And create tables in ClickHouse

Pay attention to

Monitor logs for first row in replication notification of the following structure:

These records help us to create SQL statement for Data Migration process. Sure, we can peek into MySQL database manually in order to understand what records would be the last to be copied by migration process.

Pay attention to

Values for where clause in db.log_201801_1.sql are fetched from first row in replication log: INFO:first row in replication db.log_201801_1

airline.ontime Test Case

airline.ontime Data Set in CSV files

You may want to adjust dirs where to keep ZIP and CSV file

In airline_ontime_data_download.sh edit these lines:

You may want to adjust number of files to download (In case downloading all it may take some time).

Specify year and months range as you wish:

Downloading can take some time.

airline.ontime MySQL Table

airline.ontime ClickHouse Table

airline.ontime Data Reader

You may want to adjust PYTHON path and source and target hosts and usernames

airline.ontime Data Importer

You may want to adjust CSV files location, number of imported files and MySQL user/password used for import

Testing General Schema

MySQL Data Types

BIT the number of bits per value, from 1 to 64

Date and Time Types

CHAR The range of M is 0 to 255. If M is omitted, the length is 1.

VARCHAR The range of M is 0 to 65,535

BINARY similar to CHAR

VARBINARY similar to VARCHAR

TINYBLOB maximum length of 255

TINYTEXT maximum length of 255

BLOB maximum length of 65,535

TEXT maximum length of 65,535

MEDIUMBLOB maximum length of 16,777,215

MEDIUMTEXT maximum length of 16,777,215

LONGBLOB maximum length of 4,294,967,295 or 4GB

LONGTEXT maximum length of 4,294,967,295 or 4GB

ENUM can have a maximum of 65,535 distinct elements

SET can have a maximum of 64 distinct members

JSON native JSON data type defined by RFC 7159

ClickHouse Data Types

Date number of days since 1970-01-01

DateTime Unix timestamp

UInt32 0 4294967295

UInt64 0 18446744073709551615

FixedString(N) string of N bytes (not characters or code points)

String The length is not limited. The value can contain an arbitrary set of bytes, including null bytes

Date and Time Types

MySQL Test Tables

We have to separate test table into several ones because of this error, produced by MySQL:

Insert minimal acceptable values into the test table:

Insert maximum acceptable values into the test table:

clickhouse-migrations 0.3.0

pip install clickhouse-migrations Copy PIP instructions

Released: Aug 14, 2022

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT

Tags clickhouse, migrations

Requires: Python >=3.6

Maintainers

Classifiers

Project description

Clickhouse Migrations

Clickhouse is known for its scale to store and fetch large datasets.

Development and Maintenance of large-scale db systems many times requires constant changes to the actual DB system. Holding off the scripts to migrate these will be painful.

Features:

Installation

Usage

In command line

In code

ParameterDescriptionDefault
db_hostClickhouse database hostnamelocalhost
db_userClickhouse uesr****
db_password*********
db_nameClickhouse database nameNone
migrations_homePath to list of migration files
create_db_if_no_existsIf the db_name is not present, enabling this will create the dbTrue
multi_statementAllow multiple statements in migration filesTrue

Notes

The Clickhouse driver does not natively support executing multipe statements in a single query. To allow for multiple statements in a single migration, you can use the multi_statement param. There are two important caveats:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Supported typesВ¶

Each ClickHouse type is deserialized to a corresponding Python type when SELECT queries are prepared. When serializing INSERT queries, clickhouse-driver accepts a broader range of Python types. The following ClickHouse types are supported by clickhouse-driver:

[U]Int8/16/32/64В¶

Float32/64В¶

DateВ¶

DateTime(‘timezone’)¶

Timezone support is new in version 0.0.11.

Integers are interpreted as seconds without timezone (UNIX timestamps). Integers can be used when insertion of datetime column is a bottleneck.

Setting use_client_time_zone is taken into consideration.

You can cast DateTime column to integers if you are facing performance issues when selecting large amount of rows.

String/FixedString(N)В¶

String column is encoded/decoded using UTF-8 encoding.

String column can be returned without decoding. Return values are bytes :

If a column has FixedString type, upon returning from SELECT it may contain trailing zeroes in accordance with ClickHouse’s storage format. Trailing zeroes are stripped by driver for convenience.

Enum8/16В¶

For Python 2.7 enum34 package is used.

Currently clickhouse-driver can’t handle empty enum value due to Python’s Enum mechanics. Enum member name must be not empty. See issue and workaround.

conda-forge/clickhouse-driver-feedstock

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Package license: MIT

Summary: Python driver with native interface for ClickHouse

Current build status

Azure
VariantStatus
linux_64_python3.10.____cpython
linux_64_python3.7.____73_pypy
linux_64_python3.7.____cpython
linux_64_python3.8.____cpython
linux_64_python3.9.____cpython
osx_64_python3.10.____cpython
osx_64_python3.7.____73_pypy
osx_64_python3.7.____cpython
osx_64_python3.8.____cpython
osx_64_python3.9.____cpython
osx_arm64_python3.10.____cpython
osx_arm64_python3.8.____cpython
osx_arm64_python3.9.____cpython
win_64_python3.10.____cpython
win_64_python3.7.____73_pypy
win_64_python3.7.____cpython
win_64_python3.8.____cpython
win_64_python3.9.____cpython

Current release info

Installing clickhouse-driver from the conda-forge channel can be achieved by adding conda-forge to your channels with:

Once the conda-forge channel has been enabled, clickhouse-driver can be installed with conda :

It is possible to list all of the versions of clickhouse-driver available on your platform with conda :

Alternatively, mamba repoquery may provide more information:

Python clickhouse driver. powered%20by NumFOCUS orange. Python clickhouse driver фото. Python clickhouse driver-powered%20by NumFOCUS orange. картинка Python clickhouse driver. картинка powered%20by NumFOCUS orange. pip install clickhouse-connect Copy PIP instructions

conda-forge is a community-led conda channel of installable packages. In order to provide high-quality builds, the process has been automated into the conda-forge GitHub organization. The conda-forge organization contains one repository for each of the installable packages. Such a repository is known as a feedstock.

A feedstock is made up of a conda recipe (the instructions on what and how to build the package) and the necessary configurations for automatic building using freely available continuous integration services. Thanks to the awesome service provided by Azure, GitHub, CircleCI, AppVeyor, Drone, and TravisCI it is possible to build and upload installable packages to the conda-forge Anaconda-Cloud channel for Linux, Windows and OSX respectively.

For more information please check the conda-forge documentation.

If you would like to improve the clickhouse-driver recipe or build a new package version, please fork this repository and submit a PR. Upon submission, your changes will be run on the appropriate platforms to give the reviewer an opportunity to confirm that the changes result in a successful build. Once merged, the recipe will be re-built and uploaded automatically to the conda-forge channel, whereupon the built conda packages will be available for everybody to install and use from the conda-forge channel. Note that all branches in the conda-forge/clickhouse-driver-feedstock are immediately built and any created packages are uploaded, so PRs should be based on branches in forks and branches in the main repository should only be used to build distinct package versions.

In order to produce a uniquely identifiable distribution:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

InstallationВ¶

Python VersionВ¶

Clickhouse-driver supports Python 3.4 and newer, Python 2.7, and PyPy.

Build DependenciesВ¶

Example for python:alpine docker image:

By default there are wheels for Linux, Mac OS X and Windows.

Packages for Linux and Mac OS X are available for python: 2.7, 3.4, 3.5, 3.6, 3.7, 3.8.

Packages for Windows are available for python: 2.7, 3.5, 3.6, 3.7, 3.8.

DependenciesВ¶

These distributions will be installed automatically when installing clickhouse-driver.

Optional dependenciesВ¶

These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.

Installation from PyPIВ¶

The package can be installed using pip :

You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:

You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:

Installation from githubВ¶

Development version can be installed directly from github:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

InstallationВ¶

Python VersionВ¶

Clickhouse-driver supports Python 3.4 and newer and PyPy.

Build DependenciesВ¶

Example for python:alpine docker image:

By default there are wheels for Linux, Mac OS X and Windows.

Packages for Linux and Mac OS X are available for python: 3.6 – 3.10.

Packages for Windows are available for python: 3.6 – 3.10.

Starting from version 0.2.3 there are wheels for musl-based Linux distributions.

DependenciesВ¶

These distributions will be installed automatically when installing clickhouse-driver.

Optional dependenciesВ¶

These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.

Installation from PyPIВ¶

The package can be installed using pip :

You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:

You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:

NumPy supportВ¶

You can install additional packages (NumPy and Pandas) if you need NumPy support:

NumPy supported versions are limited by numpy package python support.

Installation from githubВ¶

Development version can be installed directly from github:

infi.clickhouse-orm 0.5.1

pip install infi.clickhouse-orm==0.5.1 Copy PIP instructions

Released: Jun 28, 2016

A Python library for working with the ClickHouse database

Navigation

Project links

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: Python Software Foundation License (PSF)

Maintainers

Classifiers

Project description

Overview

This project is simple ORM for working with the ClickHouse database. It allows you to define model classes whose instances can be written to the database and read from it.

Installation

To install infi.clickhouse_orm:

Usage

Defining Models

Models are defined in a way reminiscent of Django’s ORM:

It is possible to provide a default value for a field, instead of its “natural” default (empty string for string fields, zero for numeric fields etc.).

See below for the supported field types and table engines.

Using Models

Once you have a model, you can create model instances:

When values are assigned to model fields, they are immediately converted to their Pythonic data type. In case the value is invalid, a ValueError is raised:

Inserting to the Database

To write your instances to ClickHouse, you need a Database instance:

This automatically connects to http://localhost:8123 and creates a database called my_test_db, unless it already exists. If necessary, you can specify a different database URL and optional credentials:

Using the Database instance you can create a table for your model, and insert instances to it:

The insert method can take any iterable of model instances, but they all must belong to the same model class.

Reading from the Database

Loading model instances from the database is simple:

Do not include a FORMAT clause in the query, since the ORM automatically sets the format to TabSeparatedWithNamesAndTypes.

It is possible to select only a subset of the columns, and the rest will receive their default values:

Ad-Hoc Models

Specifying a model class is not required. In case you do not provide a model class, an ad-hoc class will be defined based on the column names and types returned by the query:

This is a very convenient feature that saves you the need to define a model for each query, while still letting you work with Pythonic column values and an elegant syntax.

Counting

The Database class also supports counting records easily:

Field Types

Currently the following field types are supported:

Table Engines

Each model must have an engine instance, used when creating the table in ClickHouse.

To define a MergeTree engine, supply the date column name and the names (or expressions) for the key columns:

You may also provide a sampling expression:

A CollapsingMergeTree engine is defined in a similar manner, but requires also a sign column:

For a SummingMergeTree you can optionally specify the summing columns:

Data Replication

Any of the above engines can be converted to a replicated engine (e.g. ReplicatedMergeTree) by adding two parameters, replica_table_path and replica_name:

Development

After cloning the project, run the following commands:

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

ultram4rine/sqltools-clickhouse-driver

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

SQLTools ClickHouse Driver

ClickHouse driver for SQLTools VS Code extension.

After installation you will be able to explore tables and views, run queries, etc. For more details see SQLTools documentation.

Don’t use ; at the end of the query. Since that driver uses @apla/clickhouse library it automatically adds the FORMAT statement after query. In this case SQLTools thinks that you are sending multiple queries, which not supported (yet).

Use LIMIT when selecting from table which stores more than 100 000 (about) records.

clickhouse-driver

Python driver for ClickHouse

Navigation

Related Topics

Quick search

Welcome to clickhouse-driverВ¶

Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.

Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.

ClickHouse server provides two protocols for communication:

Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:

Once again: clickhouse-driver uses native protocol (port 9000).

There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.

API ReferenceВ¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional NotesВ¶

Legal information, changelog and contributing are here for the interested.

Python clickhouse driver

Copy raw contents

ODBC Driver for ClickHouse

Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions Python clickhouse driver. badge. Python clickhouse driver фото. Python clickhouse driver-badge. картинка Python clickhouse driver. картинка badge. pip install clickhouse-connect Copy PIP instructions

This is the official ODBC driver implementation for accessing ClickHouse as a data source.

For more information on ClickHouse go to ClickHouse home page.

For more information on what ODBC is go to ODBC Overview.

The canonical repo for this driver is located at https://github.com/ClickHouse/clickhouse-odbc.

See LICENSE file for licensing information.

Table of contents

Pre-built binary packages of the release versions of the driver available for the most common platforms at:

Note, that since ODBC drivers are not used directly by a user, but rather accessed through applications, which in their turn access the driver through ODBC driver manager, user have to install the driver for the same architecture (32- or 64-bit) as the application that is going to access the driver. Moreover, both the driver and the application must be compiled for (and actually use during run-time) the same ODBC driver manager implementation (we call them «ODBC providers» here). There are three supported ODBC providers:

If you have Homebrew installed (usually applicable to macOS only, but can also be available in Linux), just execute:

If you don’t see a package that matches your platforms under Releases, or the version of your system is significantly different than those of the available packages, or maybe you want to try a bleeding edge version of the code that hasn’t been released yet, you can always build the driver manually from sources:

Native packages will have all the dependency information so when you install the driver using a native package, all required run-time packages will be installed automatically. If you use manual packaging, i.e., just extract driver binaries to some folder, you also have to make sure that all the run-time dependencies are satisfied in your system manually:

The first step usually consists of registering the driver so that the corresponding ODBC provider is able to locate it.

The next step is defining one or more DSNs, associated with the newly registered driver, and setting driver-specific parameters in the body of those DSN definitions.

All this involves modifying a dedicated registry keys in case of MDAC, or editing odbcinst.ini (for driver registration) and odbc.ini (for DSN definition) files for UnixODBC or iODBC, directly or indirectly.

This will be performed automatically using some default values if you are installing the driver using native installers.

Otherwise, if you are configuring manually, or need to modify the default configuration created by the installer, please see the exact locations of files (or registry keys) that need to be modified in the corresponding section below:

The list of DSN parameters recognized by the driver is as follows:

URL query string

Some of configuration parameters can be passed to the server as a part of the query string of the URL.

The list of parameters in the query string of the URL that are also recognized by the driver is as follows:

ParameterDefault valueDescription
databasedefaultDatabase name to connect to
default_formatODBCDriver2Default wire format of the resulting data that the server will send to the driver. Formats supported by the driver are: ODBCDriver2 and RowBinaryWithNamesAndTypes

Note, that currently there is a difference in timezone handling between ODBCDriver2 and RowBinaryWithNamesAndTypes formats: in ODBCDriver2 date and time values are presented to the ODBC application in server’s timezone, wherease in RowBinaryWithNamesAndTypes they are converted to local timezone. This behavior will be changed/parametrized in future. If server and ODBC application timezones are the same, date and time values handling will effectively be identical between these two formats.

Troubleshooting: driver manager tracing and driver logging

To debug issues with the driver, first things that need to be done are:

Building from sources

The general requirements for building the driver from sources are as follows:

Additional requirements exist for each platform, which also depend on whether packaging and/or testing is performed.

See the exact steps for each platform in the corresponding section below:

The list of configuration options recognized during the CMake generation step is as follows:

Run-time dependencies: Windows

All modern Windows systems come with preinstalled MDAC driver manager.

Run-time dependencies: macOS

Execute the following in the terminal (assuming you have Homebrew installed):

Execute the following in the terminal (assuming you have Homebrew installed):

Run-time dependencies: Red Hat/CentOS

Execute the following in the terminal:

Execute the following in the terminal:

Run-time dependencies: Debian/Ubuntu

Execute the following in the terminal:

Execute the following in the terminal:

Configuration: MDAC/WDAC (Microsoft/Windows Data Access Components)

To configure already installed drivers and DSNs, or create new DSNs, use Microsoft ODBC Data Source Administrator tool:

For full description of ODBC configuration mechanism in Windows, as well as for the case when you want to learn how to manually register a driver and have a full control on configuration in general, see:

Note, that the keys are subject to «Registry Redirection» mechanism, with caveats.

You can find sample configuration for this driver here (just map the keys to corresponding sections in registry):

In short, usually you will end up editing /etc/odbcinst.ini and /etc/odbc.ini for system-wide driver and DSN entries, and

/.odbc.ini for user-wide driver and DSN entries.

For more info, see:

You can find sample configuration for this driver here:

These samples can be added to the corresponding configuration files using the odbcinst tool (assuming the package is installed under /usr/local ):

In short, usually you will end up editing /etc/odbcinst.ini and /etc/odbc.ini for system-wide driver and DSN entries, and

/.odbc.ini for user-wide driver and DSN entries.

In macOS, if those INI files exist, they usually are symbolic or hard links to /Library/ODBC/odbcinst.ini and /Library/ODBC/odbc.ini for system-wide, and

/Library/ODBC/odbc.ini for user-wide configs respectively.

For more info, see:

You can find sample configuration for this driver here:

Enabling driver manager tracing: MDAC/WDAC (Microsoft/Windows Data Access Components)

Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:

Enabling driver manager tracing: UnixODBC

Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:

Enabling driver manager tracing: iODBC

Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:

Building from sources: Windows

CMake bundled with the recent versions of Visual Studio can be used.

An SDK required for building the ODBC driver is included in Windows SDK, which in its turn is also bundled with Visual Studio.

All of the following commands have to be issued in Visual Studio Command Prompt:

Clone the repo with submodules:

Enter the cloned source tree, create a temporary build folder, and generate the solution and project files in it:

Build the generated solution in-place:

. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):

Building from sources: macOS

You will need macOS 10.14 or later, Xcode 10 or later with Command Line Tools installed, as well as up-to-date Homebrew available in the system.

Install Homebrew using the following command, and follow the printed instructions on any additional steps required to complete the installation:

Then, install the latest Xcode from App Store. Open it at least once to accept the end-user license agreement and automatically install the required components.

Then, make sure that the latest Command Line Tools are installed and selected in the system:

Build-time dependencies: iODBC

Execute the following in the terminal:

Build-time dependencies: UnixODBC

Execute the following in the terminal:

Clone the repo recursively with submodules:

Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:

Build the generated solution in-place:

. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):

Building from sources: Red Hat/CentOS

Build-time dependencies: UnixODBC

Execute the following in the terminal:

Build-time dependencies: iODBC

Execute the following in the terminal:

All of the following commands have to be issued right after this one command issued in the same terminal session:

Clone the repo with submodules:

Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:

Build the generated solution in-place:

. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):

Building from sources: Debian/Ubuntu

Build-time dependencies: UnixODBC

Execute the following in the terminal:

Build-time dependencies: iODBC

Execute the following in the terminal:

Assuming, that the system cc and c++ are pointing to the compilers that satisfy the minimum requirements from Building from sources.

If the version of cmake is not recent enough, you can install a newer version by folowing instructions from one of these pages:

Clone the repo with submodules:

Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:

Build the generated solution in-place:

. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):

Источники:

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *