Python clickhouse driver
Python clickhouse driver
clickhouse-connect 0.2.4
pip install clickhouse-connect Copy PIP instructions
Released: Aug 19, 2022
ClickHouse core driver, SqlAlchemy, and Superset libraries
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: Apache Software License (Apache License 2.0)
Requires: Python
Maintainers
Classifiers
Project description
ClickHouse Connect
A suite of Python packages for connecting Python to ClickHouse, initially supporting Apache Superset using a minimal read only SQLAlchemy dialect. Uses the ClickHouse HTTP interface.
Installation
ClickHouse Connect requires Python 3.7 or higher. The cython package must be installed prior to installing clickhouse_connect to build and install the optional Cython/C extensions used for improving read and write performance using the ClickHouse Native format. After installing cython if desired, clone this repository and run python setup.py install from the project directory.
Getting Started
Simple ‘command’ that does not return a result set.
Bulk insert of a matrix of rows and columns.
Minimal SQLAlchemy Support
On installation ClickHouse Connect registers the clickhousedb SQLAlchemy Dialect entry point. This dialect supports basic table reflection for table columns and datatypes, and command and query execution using DB API 2.0 cursors. Most ClickHouse datatypes have full query/cursor support.
ClickHouse Connect does not yet implement the full SQLAlchemy API for DDL (Data Definition Language) or ORM (Object Relational Mapping). These features are in development.
Superset Support
On installation ClickHouse Connect registers the clickhousedb Superset Database Engine Spec entry point. Using the clickhousedb SQLAlchemy dialect, the engine spec supports complete data exploration and Superset SQL Lab functionality with all standard ClickHouse data types. See Connecting Superset for complete instructions.
ClickHouse Enum, UUID, and IP Address datatypes are treated as strings. For compatibility with Superset Pandas dataframes, unsigned UInt64 data types are interpreted as signed Int64 values. ClickHouse CSV Upload via SuperSet is not yet implemented.
Optional Features
SQLAlchemy and Superset require the corresponding SQLAlchemy and Apache Superset packages to be included in your Python installation. ClickHouse connect also includes C/Cython extensions for improved performance reading String and FixedString datatypes. These extensions will be installed automatically by setup.py if a C compiler is available.
Query results can be returned as either a numpy array or a pandas DataFrame if the numpy and pandas libraries are available. Use the client methods query_np and query_df respectively.
Tests
Main Client Interface
Interaction with the ClickHouse server is done through a clickhouse_connect Client instance. At this point only an HTTP(s) based Client is supported.
HTTP Client constructor/initialization parameters
Create a ClickHouse client using the clickhouse_connect.driver.create_client(. ) function or clickhouse_connect.get_client(. ) wrapper. All parameters are optional:
Any remaining keyword parameters are interpreted as ‘setting’ parameters to send to the ClickHouse server with every query/request
Querying data
Use the client query method to retrieve a QueryResult from ClickHouse. Parameters:
The query method results a QueryResult object with the following fields:
Numpy and Pandas queries
Datatype options for queries
There are some convenience methods in the clickhouse_connect.driver package that control the format of some ClickHouse datatypes. These are included in part to improve Superset compatibility.
Inserting data
Use the client insert method to insert data into a ClickHouse table. Parameters:
Notes on data inserts
The client insert_df can be used to insert a Pandas DataFrame, assuming the column names in the DataFrame match the ClickHouse table column names. Note that a Numpy array can be passed directly as the data parameter to the primary insert method so there is no separate insert_np method.
For column types that can be different native Python types (for example, UUIDs or IP Addresses), the driver will assume that the data type for the whole column matches the first non «None» value in the column and process insert data accordingly. So if the first data value for insert into a ClickHouse UUID column is a string, the driver will assume all data values in that insert column are strings.
DDL and other «simple» SQL statements
The client command method can be used for ClickHouse commands/queries that return a single result or row of results values. In this case the result is returned as a single row TabSeparated values and are cast to a single string, int, or list of string values. The command method parameters are:
Python clickhouse driver
External data for query processing
You can pass external data alongside with query:
There are a lot of ClickHouse server settings. Settings can be specified during Client initialization:
Client with compression support can be constructed as follows:
CityHash algorithm notes
Unfortunately ClickHouse server comes with built-in old version of CityHash algorithm (1.0.2). That’s why we can’t use original CityHash package. An older version is published separately at PyPI.
Specifying query id
You can manually set query identificator for each query. UUID for example:
You can cancel query with specific id by sending another query with the same query id if option replace_running_query is set to 1.
Query results are fetched by the same instance of Client that emitted query.
Retrieving results in columnar form
Columnar form sometimes can be more useful.
Data types checking on INSERT
Data types check is disabled for performance on INSERT queries. You can turn it on by types_check option:
Query execution statistics
Client stores statistics about last query execution. It can be obtained by accessing last_query attribute. Statistics is sent from ClickHouse server and calculated on client side. last_query contains info about:
profile: rows before limit
Receiving server logs
Query logs can be received from server by using send_logs_level setting:
New in version 0.1.3.
Additional connection points can be defined by using alt_hosts. If main connection point is unavailable driver will use next one from alt_hosts.
This option is good for ClickHouse cluster with multiple replicas.
In example above on every new connection driver will use following sequence of hosts if previous host is unavailable:
All queries within established connection will be sent to the same host.
Python DB API 2.0
New in version 0.1.3.
This driver is also implements DB API 2.0 specification. It can be useful for various integrations.
Threads may share the module and connections.
:ref:`dbapi-connection` class is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.
There are some non-standard ClickHouse-related :ref:`Cursor methods ` for: external data, settings, etc.
For automatic disposal Connection and Cursor instances can be used as context managers:
You can use cursor_factory argument to get results as dicts or named tuples (since version 0.2.4):
New in version 0.1.6.
Direct loading into NumPy arrays increases performance and lowers memory requirements on large amounts of rows.
Direct loading into pandas DataFrame is also supported by using query_dataframe:
Writing pandas DataFrame is also supported with insert_dataframe:
Starting from version 0.2.2 nullable columns are also supported. Keep in mind that nullable columns have object dtype. For convenience np.nan and None is supported as NULL values for inserting. But only None is returned after selecting for NULL values.
It’s important to specify dtype during dataframe creation:
New in version 0.2.2.
Each Client instance can be used as a context manager:
Upon exit, any established connection to the ClickHouse server will be closed automatically.
clickhouse-driver 0.2.4
pip install clickhouse-driver Copy PIP instructions
Released: Jun 13, 2022
Python driver with native interface for ClickHouse
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags ClickHouse, db, database, cloud, analytics
Requires: Python >=3.4, xzkostyan
Classifiers
Project description
ClickHouse Python Driver
ClickHouse Python Driver with native (TCP) interface support.
Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch
Features
Documentation
Usage
There are two ways to communicate with server:
Pure Client example:
License
ClickHouse Python Driver is distributed under the MIT license.
Project details
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags ClickHouse, db, database, cloud, analytics
Requires: Python >=3.4, xzkostyan
ClickHouse and Python: Getting to Know the Clickhouse-driver Client
Python is a force in the world of analytics due to powerful libraries like numpy along with a host of machine learning frameworks. ClickHouse is an increasingly popular store of data. As a Python data scientist, you may wonder how to connect them.
Fortunately, the Altinity Blog is here to solve mysteries, at least those that involve ClickHouse. This post contains a review of the clickhouse-driver client. It’s a solidly engineered module that is easy to use and integrates easily with standard tools like Jupyter Notebooks and Anaconda. Clickhouse-driver is a great way to jump into ClickHouse Python connectivity.
So Many Python Choices
The first hurdle for Python users is just picking a suitable driver. Even a quick search on pypi.org shows 22 projects with ClickHouse references. They include SQLAlchemy drivers (3 choices), async clients (also 3), and a Pandas-to-ClickHouse interface among others.
Clickhouse-driver offers a straightforward interface that enables Python clients to connect to ClickHouse, issue SELECT and DDL commands, and process results. It’s a good choice for direct Python connectivity with 16 published releases on pypi.org. The latest version is 0.0.17, published on January 10, 2019. If you want to connect to the data warehouse, issue SQL commands, and fetch back data, clickhouse-driver is a great place to start.
Code and Community
The clickhouse-driver source code is published on Github under an MIT license. The main committer is Konstantin Lebedev (@xzkostyan) though there have been a few contributions from others.
Konstantin is very responsive to questions about the driver, which you can register as issues. Much of my understanding of the wire protocol started from Konstantin’s comprehensive responses to an issue related to CSV loading that I filed early on in my use of the code. He has helped a number of other users as well.
Installation
You can of course install clickhouse-driver straight from Github but since releases are posted on pypi.org it’s far easier to use pip, like the example below. Just a note: examples are based on Python 3.7. This installation command includes lz4 compression, which can reduce data transfer sizes enormously.
For testing purposes it’s a best practice to use a virtual environment, which means the installation usually looks like the following example:
If you use Anaconda there is conveniently a clickhouse package in Anaconda Cloud. You can install it with the following command:
After doing this you can use clickhouse-driver in Jupyter Notebooks served up by Anaconda. We will dig more deeply into Anaconda integration in a future blog article. Meanwhile, this should get you started.
Documentation
One of the strengths of clickhouse-driver is excellent documentation. The docs provide a nice introduction to the code as well as detailed descriptions of the API. In fact, it was somewhat challenging to make useful code-level observations for this article because the documentation already covered API behavior so well.
The docs should probably be the first stop for new clickhouse-driver users but are easy to overlook initially since they are referenced at the bottom of the project README.md. I only noticed them after writing a couple of test programs. It would be nice if docs were published in future using Github pages, which puts a prominent link on the top of the Github project. Once you find them though you’ll refer to them regularly.
Basic Operation
Clickhouse-driver is very simple to use. The main interface is the Client class, which most programs import directly.
To set up a connection you instantiate the class with appropriate arguments. Here’s the simplest example for a connection to a localhost server using the default ClickHouse user and unencrypted communications. This is sufficient for trivial tests.
Of course, real applications are more demanding. It’s typical to see something akin to the sample code below. It has a non-default user on a secure connection with self-signed certificates. The database is also different from the usual ‘default’. To top it off we are compressing data.
The option flexibility is great. In particular security options are robust and include basic features corporate InfoSec teams expect. With the foregoing options clickhouse-driver auto-negotiates to TLSv1.2 on a properly configured ClickHouse server. That meets current PCI standards among others. I was also very pleased to find easy support for self-signed certificates, which are common in test scenarios.
Creating a client sets up the connection information but does not actually touch the ClickHouse server. The connection is established when you invoke the Client.execute() method. Here’s an example of a simple SELECT, followed by some code to iterate through the query result so we can see how it is put together.
The output is shown below. It’s a list of tuples containing column values.
The result format has a couple of advantages. First, it’s easy to manipulate in Python. For example, you can just print any part of the output and it will show values, which is handy for debugging. Second, you can use values immediately rather than having to figure out conversions yourselves. That’s handy because Python does not automatically do even relatively simple coercions like str to int in numerical equations.
Let’s quickly tour operations to create a table, load some data, and fetch it back.
Data definition language (DDL) like CREATE TABLE uses a single string argument. The following example splits the string across lines for readability.
INSERT statements take an extra params argument to hold the values, as shown by the following example.
The format for values is the same as the result format for SELECT statements. Clickhouse-driver uses a similar format in both directions. The INSERT params also support dictionary organization as well as generators, as we’ll see in a later section. See the docs for more insert examples.
We already showed an example of a SELECT statement using functions to generate output. Selecting out of a table looks pretty much the same, as shown by the following example.
Clickhouse-driver has a lot of useful features related to SELECTs. For instance, you can enable progress tracking using the Client.execute_with_progress() method, which is great when pulling down large result sets. Similarly the Client.execute_iter() method allows you to chunk results from large datasets to avoid overflowing memory. There’s even cancellation which covers you when somebody accidentally selects a few billion rows. Again, see the docs for examples.
One place where you need to be a little wary is prevention of SQL injection attacks. The procedure for query parameterization uses Python dictionary substitutions, as in the following example.
You might try to circumvent the substitution scheme by setting ‘species’ to a string like “‘Iris-setosa’ AND evil_function() = 0”. The clickhouse-driver cleverly foils this attack by escaping strings and other common data types before doing substitutions. The query ends up looking like the following, which may break but won’t call evil_function() unexpectedly.
This approach will protect you from run-of-the-mill villany with strings but there are ways around it. For instance, it appears possible to pass in Python object types that will not be escaped properly. (Check the driver code here to see why this might be so.) You should review substitution format strings carefully and also check Python parameter types at runtime to ensure something bad does not weasel through. That’s especially the case for Internet-facing applications.
A Deeper Look at the ClickHouse Wire Protocol
This is a good time to discuss what’s actually happening on the wire when communicating between the Python client and ClickHouse. To set context, ClickHouse has two wire protocols: HTTP protocol which uses simple PUT and POST operations to issue queries, and a native TCP/IP protocol that ships data as typed values. These run on different ports so there’s no confusion.
Clickhouse-driver uses the native TCP/IP protocol. This choice is better for Pythonistas because the native protocol knows about types and avoids loss of precision due to binary-to-string conversions. The implementation is correct, at least for the samples that I tried. That is an impressive accomplishment, because the documentation for the native protocol is the C++ implementation code.
As you go deeper into Python access to ClickHouse it’s helpful to understand what the TCP/IP protocol is actually doing. When you run a query, ClickHouse returns results in a binary block format that contains column results in a typed binary format. Here’s an example:
Unlike many databases, ClickHouse results are column-oriented (like the storage). This means that compression works well on query results just as it does on stored values. Compression is invisible to users but can vastly reduce network traffic.
Where ClickHouse differs from many other DBMS implementations is on upload. Let’s look at the INSERT statement again from the previous section.
This format may be a little confusing if you are used to executing INSERT statements as a single string, which is typical for many DBMS types. What you are seeing is a side-effect of the native TCP/IP wire protocol, which ships typed values in both directions. The data values use a column-oriented format, just like the query output.
The TCP/IP protocol has another curious effect, which is that sending INSERTs as a single string won’t even work in clickhouse-driver. It just hangs and will eventually time out.
What’s going on? The server has the first part of the INSERT and is now waiting for data from the client to complete the INSERT in the native protocol. Meanwhile, the client is waiting for the server to respond. This behavior is clearly documented in the clickhouse-driver documentation so one could argue it’s not a bug: you are doing something the protocol does not expect. I don’t completely agree with that view, mostly because it’s confusing to newcomers. This seems like a nice pull request for somebody to work on in future.
But wait, you might ask. The C++ clickhouse-client binary will process an INSERT like the one shown above. How can that possibly work? Well, the trick is that clickhouse-client runs the same code as the ClickHouse server and can parse the query on the client side. It extracts and sends the INSERT statement up to the VALUES clause, waits for the server to send back data types, then converts and sends the data as column-oriented blocks.
Overall the wire protocol is quite reasonable once you understand what is going on. Problems like hanging INSERTs easy to avoid. If you have further questions I suggest firing up WireShark and watching the packets on an unencrypted, uncompressed connection. It’s relatively easy to figure out what’s happening.
Loading CSV
Armed with a better understanding of what the clickhouse-driver is doing under the covers we can tackle a final topic: how to load CSV.
As we now know you can’t just pipe raw CSV into the the driver the way that the clickhouse-client program does it. Fortunately, there’s an easy solution. You can parse CSV into a list of tuples as shown in the following example.
This code works for the Iris dataset values used in this sample, which are relatively simple and automatically parse into types that load properly. For more diverse tables you may need to add additional logic to coerce types. Here’s another approach that works by assigning values in each line to a dictionary. It’s more complex but ensures types are correctly assigned. You can also rearrange the order of columns in the input and do other manipulations to clean up data.
As files run into the 100s of megabytes or more you may want to consider alternatives to Python to get better throughput. Parsing and converting data in Python is relatively slow compared to the C++ clickhouse-client. I would recommend load testing any Python solution for large scale data ingest to ensure you don’t hit bottlenecks.
Summary and Acknowledgments
The clickhouse-driver is relatively young but it is very capable. I am impressed by the thoughtful design, quality of the implementation, and excellent documentation. It looks like a solid base for future Python work with ClickHouse. We’ll review more Python client solutions in the future but for new users, clickhouse-driver is a great place to start.
Thanks to Konstantin Lebedev for reviewing a draft of this article!
Originally published on the Altinity blog on February 1, 2019.
Как записать данные в Clickhouse с помощью Python
Как развернуть быстро Clickhouse с помощью Docker
Как собрать свой образ Clickhouse (docker-compose из оффициальной репы)
В репозитории Github Clickhouse лежит docker-compose.yml файл со следующим содержимым:
Данный файл запускает сборку образов и после сборки запускает контейнеры. Запустится три контейнера:
Сразу скажу, что не очень подходящий способ для использования Clickhouse, т.к. сборка образов требует много времени и лучше использовать уже сформированный официальный образ от Clickhouse.
Но если вы хотите пойти по пути сборки своих образов, то Вам понадобится запустить следующие команды на сервере:
Установка Clickhouse из официального образа с hub.docker.com
Официальная инструкция как задеплоить образ находится здесь https://hub.docker.com/r/clickhouse/clickhouse-server/.
Скачается официальный докер образ (но пока еще не запустится):
Далее необходимо запустить из образа контейнер:
Чтобы проверить, что у Вас запустился контейнер с Clickhouse — запустите команду:
Проверить работу Clickhouse-server можно перейдя по url http : //localhost:8123/ :
Далее запускаем команду:
С помощью этой команды мы подключимся к Clickhouse через native client:
Клиент командной строки
Установка на Ubuntu:
Клиенты и серверы различных версий совместимы, однако если клиент старее сервера, то некоторые новые функции могут быть недоступны. Рекомендуется использовать одинаковые версии клиента и сервера.
Видео «Установка базы данных ClickHouse в виде контейнера Docker»
Установка Clickhouse с помощью docker-compose
Далее создаем папку db, куда clickhouse будет сохранять файлы:
Далее создаем файл docker-compose.yml
Далее запускаем установку с помощью docker-compose:
Зайти внутрь клиента кликхауса можно с помощью команды:
Подключаемся к Clickhouse с помощью DBeaver
Установить dbeaver в Ubuntu можно через Ubuntu Software:
Выбираем коннектор к Clickhouse:
Настройки подключения с дефолтным юзером:
show databases — проверочный запрос к clickhouse:
Интерфейсы для доступа к Clickhouse
ClickHouse имеет богатый набор функций для управления сетевыми подключениями для клиентов, а также для других серверов в кластере. Тем не менее, новым пользователям может быть сложно проработать возможные варианты, а опытным пользователям может быть сложно обеспечить полный доступ к развернутым системам для приложений и их надлежащую защиту.
ClickHouse предоставляет три сетевых интерфейса (они могут быть обернуты в TLS для дополнительной безопасности):
В большинстве случаев рекомендуется использовать подходящий инструмент или библиотеку, а не напрямую взаимодействовать с ClickHouse. Официально поддерживаемые Яндексом:
Существует также широкий спектр сторонних библиотек для работы с ClickHouse:
Что такое http-интерфейс
HTTP интерфейс позволяет использовать ClickHouse на любой платформе, из любого языка программирования. HTTP интерфейс более ограничен по сравнению с родным интерфейсом, но является более совместимым. По умолчанию clickhouse-server слушает HTTP на порту 8123. Запрос отправляется в виде URL параметра с именем query. Или как тело запроса при использовании метода POST. Или начало запроса в URL параметре query, а продолжение POST-ом. Размер URL ограничен 16KB, это следует учитывать при отправке больших запросов.
Порт 8123 является конечной точкой интерфейса HTTP по умолчанию. Вы будете использовать этот порт, если используете команды curl для отправки запросов серверу. Кроме того, ряд библиотек, таких как JDBC-драйвер Yandex ClickHouse, скрытно используют HTTP-запросы, так что вы можете использовать http-интерфейс, даже не подозревая об этом.
Что такое Native TCP (Родной интерфейс)
Нативный протокол используется в клиенте командной строки, для взаимодействия между серверами во время обработки распределенных запросов, а также в других программах на C++. К сожалению, у родного протокола ClickHouse пока нет формальной спецификации.
Порт 9000 является конечной точкой Native TCP интерфейса (по-умолчанию). Он широко используется клиентами, как показано на следующих примерах.
Что такое gRPC
ClickHouse поддерживает интерфейс gRPC. Это система удаленного вызова процедур с открытым исходным кодом, которая использует HTTP/2 и Protocol Buffers.
gRPC — мощный фреймворк для работы с удаленными вызовами процедур. RPC позволяют писать код так, как если бы он был запущен на локальном компьютере, даже если он может выполняться на другом компьютере.
Как правило, gRPC считается лучшей альтернативой протоколу REST для микросервисной архитектуры. Букву g в gRPC можно отнести к компании Google, которая изначально разработала эту технологию. gRPC создан для преодоления ограничений REST в связи с микросервисами.
gRPC — это новейшая структура, созданная на основе протокола RPC. Он использует свои преимущества и пытается исправить проблемы традиционного RPC. gRPC использует буферы протокола в качестве языка определения интерфейса для сериализации и связи вместо JSON/XML.
Буферы протокола могут описывать структуру данных, и на основе этого описания может быть сгенерирован код для генерации или анализа потока байтов, представляющего структурированные данные. По этой причине gRPC предпочтительнее для многоязычных веб-приложений (реализованных с использованием различных технологий). Формат двоичных данных позволяет облегчить общение. gRPC также можно использовать с другими форматами данных, но предпочтительным является буфер протокола.
Кроме того, gRPC построен на основе HTTP/2, который поддерживает двунаправленную связь наряду с традиционным запросом/ответом. gRPC допускает слабую связь между сервером и клиентом. На практике клиент открывает долговременное соединение с сервером gRPC, и новый поток HTTP/2 открывается для каждого вызова RPC.
В отличие от REST, который использует JSON (в основном), gRPC использует буферы протокола, которые являются лучшим способом кодирования данных. Поскольку JSON — это текстовый формат, он будет намного тяжелее, чем сжатые данные в формате protobuf.
Network Listener Configuration
ClickHouse позволяет легко включать и отключать порты слушателей, а также назначать им новые номера. Для каждого типа порта существуют простые теги config.xml, как показано в следующей таблице. Столбец обычных значений показывает номер порта, который большинство клиентов предполагает для определенного соединения. Если вы измените значение, вам может потребоваться соответствующее изменение клиентов.
| Тег | Описание | Условное значение |
| http_port | Порт для незашифрованных HTTP-запросов | 8123 |
| https_port | Порт для зашифрованных запросов HTTPS | 8443 |
| interserver_http_port | Порт для незашифрованного трафика HTTP-репликации | 9009 |
| interserver_https_port | Порт для зашифрованного трафика репликации HTTPS | |
| tcp_port | Порт для незашифрованных собственных запросов TCP/IP | 9000 |
| tcp_port_secure | Порт для зашифрованных TLS собственных запросов TCP/IP | 9440 |
Как создать database в Clickhouse, таблицу и вставить тестовые данные
Идем в dbeaver и запускаем скрипты.
1. Создаем базу данных в Clickhouse
ClickHouse and Python: Getting to Know the Clickhouse-driver Client
Python is a force in the world of analytics due to powerful libraries like numpy along with a host of machine learning frameworks. ClickHouse is an increasingly popular store of data. As a Python data scientist you may wonder how to connect them.
Fortunately the Altinity Blog is here to solve mysteries, at least those that involve ClickHouse. This post contains a review of the clickhouse-driver client. It’s a solidly engineered module that is easy to use and integrates easily with standard tools like Jupyter Notebooks and Anaconda. Clickhouse-driver is a great way to jump into ClickHouse Python connectivity.
So Many Python Choices
The first hurdle for Python users is just picking a suitable driver. Even a quick search on pypi.org shows 22 projects with ClickHouse references. They include SQLAlchemy drivers (3 choices), async clients (also 3), and a Pandas-to-ClickHouse interface among others.
Clickhouse-driver offers a straightforward interface that enables Python clients to connect to ClickHouse, issue SELECT and DDL commands, and process results. It’s a good choice for direct Python connectivity with 16 published releases on pypi.org. The latest version is 0.0.17, published on January 10, 2019. If you want to connect to the data warehouse, issue SQL commands, and fetch back data, clickhouse-driver is a great place to start.
Code and Community
The clickhouse-driver source code is published on Github under an MIT license. The main committer is Konstantin Lebedev (@xzkostyan) though there have been a few contributions from others.
Konstantin is very responsive to questions about the driver, which you can register as issues. Much of my understanding of the wire protocol started from Konstantin’s comprehensive responses to an issue related to CSV loading that I filed early on in my use of the code. He has helped a number of other users as well.
Installation
You can of course install clickhouse-driver straight from Github but since releases are posted on pypi.org it’s far easier to use pip, like the example below. Just a note: examples are based on Python 3.7. This installation command includes lz4 compression, which can reduce data transfer sizes enormously.
For testing purposes it’s a best practice to use a virtual environment, which means the installation usually looks like the following example:
If you use Anaconda there is conveniently a clickhouse package in Anaconda Cloud. You can install it with the following command:
After doing this you can use clickhouse-driver in Jupyter Notebooks served up by Anaconda. We will dig more deeply into Anaconda integration in a future blog article. Meanwhile this should get you started.
Documentation
One of the strengths of clickhouse-driver is excellent documentation. The docs provide a nice introduction to the code as well as detailed descriptions of the API. In fact, it was somewhat challenging to make useful code-level observations for this article because the documentation already covered API behavior so well.
The docs should probably be the first stop for new clickhouse-driver users but are easy to overlook initially since they are referenced at the bottom of the project README.md. I only noticed them after writing a couple of test programs. It would be nice if docs were published in future using Github pages, which puts a prominent link on the top of the Github project. Once you find them though you’ll refer to them regularly.
Basic Operation
Clickhouse-driver is very simple to use. The main interface is the Client class, which most programs import directly.
To set up a connection you instantiate the class with appropriate arguments. Here’s the simplest example for a connection to a localhost server using the default ClickHouse user and unencrypted communications. This is sufficient for trivial tests.
Of course real applications are more demanding. It’s typical to see something akin to the sample code below. It has a non-default user on a secure connection with self-signed certificates. The database is also different from the usual ‘default’. To top it off we are compressing data.
The option flexibility is great. In particular security options are robust and include basic features corporate InfoSec teams expect. With the foregoing options clickhouse-driver auto-negotiates to TLSv1.2 on a properly configured ClickHouse server. That meets current PCI standards among others. I was also very pleased to find easy support for self-signed certificates, which are common in test scenarios.
Creating a client sets up the connection information but does not actually touch the ClickHouse server. The connection is established when you invoke the Client.execute() method. Here’s an example of a simple SELECT, followed by some code to iterate through the query result so we can see how it is put together.
The output is shown below. It’s a list of tuples containing column values.
The result format has a couple of advantages. First, it’s easy to manipulate in Python. For example you can just print any part of the output and it will show values, which is handy for debugging. Second, you can use values immediately rather than having to figure out conversions yourselves. That’s handy because Python does not automatically do even relatively simple coercions like str to int in numerical equations.
Let’s quickly tour operations to create a table, load some data, and fetch it back.
Data definition language (DDL) like CREATE TABLE uses a single string argument. The following example splits the string across lines for readability.
INSERT statements take an extra params argument to hold the values, as shown by the following example.
The format for values is the same as the result format for SELECT statements. Clickhouse-driver uses a similar format in both directions. The INSERT params also support dictionary organization as well as generators, as we’ll see in a later section. See the docs for more insert examples.
We already showed an example of a SELECT statement using functions to generate output. Selecting out of a table looks pretty much the same, as shown by the following example.
Clickhouse-driver has a lot of useful features related to SELECTs. For instance, you can enable progress tracking using the Client.execute_with_progress() method, which is great when pulling down large result sets. Similarly the Client.execute_iter() method allows you to chunk results from large datasets to avoid overflowing memory. There’s even cancellation which covers you when somebody accidentally selects a few billion rows. Again, see the docs for examples.
One place where you need to be a little wary is prevention of SQL injection attacks. The procedure for query parameterization uses Python dictionary substitutions, as in the following example.
You might try to circumvent the substitution scheme by setting ‘species’ to a string like “‘Iris-setosa’ AND evil_function() = 0”. The clickhouse-driver cleverly foils this attack by escaping strings and other common data types before doing substitutions. The query ends up looking like the following, which may break but won’t call evil_function() unexpectedly.
This approach will protect you from run-of-the-mill villany with strings but there are ways around it. For instance, it appears possible to pass in Python object types that will not be escaped properly. (Check the driver code here to see why this might be so.) You should review substitution format strings carefully and also check Python parameter types at runtime to ensure something bad does not weasel through. That’s especially the case for Internet-facing applications.
A Deeper Look at the ClickHouse Wire Protocol
This is a good time to discuss what’s actually happening on the wire when communicating between the Python client and ClickHouse. To set context, ClickHouse has two wire protocols: HTTP protocol which uses simple PUT and POST operations to issue queries, and a native TCP/IP protocol that ships data as typed values. These run on different ports so there’s no confusion.
Clickhouse-driver uses the native TCP/IP protocol. This choice is better for Pythonistas because the native protocol knows about types and avoids loss of precision due to binary-to-string conversions. The implementation is correct, at least for the samples that I tried. That is an impressive accomplishment, because the documentation for the native protocol is the C++ implementation code.
As you go deeper into Python access to ClickHouse it’s helpful to understand what the TCP/IP protocol is actually doing. When you run a query, ClickHouse returns results in a binary block format that contains column results in a typed binary format. Here’s an example:
Unlike many databases ClickHouse results are column-oriented (like the storage). This means that compression works well on query results just as it does on stored values. Compression is invisible to users but can vastly reduce network traffic.
Where ClickHouse is differs from many other DBMS implementations is on upload. Let’s look at the INSERT statement again from the previous section.
This format may be a little confusing if you are used to executing INSERT statements as a single string, which is typical for many DBMS types. What you are seeing is a side-effect of the native TCP/IP wire protocol, which ships typed values in both directions. The data values use a column-oriented format, just like the query output.
The TCP/IP protocol has another curious effect, which is that sending INSERTs as a single string won’t even work in clickhouse-driver. It just hangs and will eventually time out.
What’s going on? The server has the first part of the INSERT and is now waiting for data from the client to complete the INSERT in the native protocol. Meanwhile, the client is waiting for the server to respond. This behavior is clearly documented in the clickhouse-driver documentation so one could argue it’s not a bug: you are doing something the protocol does not expect. I don’t completely agree with that view, mostly because it’s confusing to newcomers. This seems like a nice pull request for somebody to work on in future.
But wait, you might ask. The C++ clickhouse-client binary will process an INSERT like the one shown above. How can that possibly work? Well, the trick is that clickhouse-client runs the same code as the ClickHouse server and can parse the query on the client side. It extracts and sends the INSERT statement up to the VALUES clause, waits for the server to send back data types, then converts and sends the data as column-oriented blocks.
Overall the wire protocol is quite reasonable once you understand what is going on. Problems like hanging INSERTs easy to avoid. If you have further questions I suggest firing up WireShark and watching the packets on an unencrypted, uncompressed connection. It’s relatively easy to figure out what’s happening.
Loading CSV
Armed with a better understanding of what the clickhouse-driver is doing under the covers we can tackle a final topic: how to load CSV.
As we now know you can’t just pipe raw CSV into the the driver the way that the clickhouse-client program does it. Fortunately, there’s an easy solution. You can parse CSV into a list of tuples as shown in the following example.
This code works for the Iris dataset values used in this sample, which are relatively simple and automatically parse into types that load properly. For more diverse tables you may need to add additional logic to coerce types. Here’s another approach that works by assigning values in each line to a dictionary. It’s more complex but ensures types are correctly assigned. You can also rearrange the order of columns in the input and do other manipulations to clean up data.
As files run into the 100s of megabytes or more you may want to consider alternatives to Python to get better throughput. Parsing and converting data in Python is relatively slow compared to the C++ clickhouse-client. I would recommend load testing any Python solution for large scale data ingest to ensure you don’t hit bottlenecks.
Summary and Acknowledgments
The clickhouse-driver is relatively young but it is very capable. I am impressed by the thoughtful design, quality of the implementation, and excellent documentation. It looks like a solid base for future Python work with ClickHouse. We’ll review more Python client solutions in the future but for new users clickhouse-driver is a great place to start.
Thanks to Konstantin Lebedev for reviewing a draft of this article!
Python clickhouse driver
Copy raw contents
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the :ref:`installation` section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Every query should be executed by calling one of the client’s execute methods: execute, execute_with_progress, execute_iter method.
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Percent symbols in inlined constants should be doubled if you mix constants with % symbol and %(myvar)s parameters.
NOTE: formatting queries using Python’s f-strings or concatenation can lead to SQL injections. Use %(myvar)s parameters instead.
Customisation SELECT output with FORMAT clause is not supported.
Selecting data with progress statistics
You can get query progress statistics by using execute_with_progress. It can be useful for cancelling long queries.
When you are dealing with large datasets block by block results streaming may be useful:
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
ClickHouse will execute this query like a usual SELECT query.
Inserting data in different formats with FORMAT clause is not supported.
See :ref:`insert-from-csv-file` if you need to data in custom format.
DDL queries can be executed in the same way SELECT queries are executed:
Async and multithreading
Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.
To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.
The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.
However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
QuickstartВ¶
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Selecting dataВ¶
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Selecting data with progress statisticsВ¶
Streaming resultsВ¶
When you are dealing with large datasets block by block results streaming may be useful:
Inserting dataВ¶
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
Of course for INSERT … SELECT queries data is not needed:
ClickHouse will execute this query like a usual SELECT query.
DDL queries can be executed in the same way SELECT queries are executed:
Asynchronous behaviorВ¶
Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool. To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
This part of the documentation covers basic classes of the driver: Client, Connection and others.
ClientВ¶
Client for communication with the ClickHouse server. Single connection is established per each connected instance of the client.
The following keys when passed in settings are used for configuring the client itself:
Disconnects from the server.
execute ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶
Establishes new connection if it wasn’t established yet. After query execution connection remains intact for next queries. If connection can’t be reused it will be closed and new connection will be created.
execute_iter ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶
New in version 0.0.14.
execute_with_progress ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶
classmethod from_url ( url ) В¶
Return a client configured from the given URL.
Any additional querystring arguments will be passed along to the Connection class’s initializer.
insert_dataframe ( query, dataframe, external_tables=None, query_id=None, settings=None ) В¶
New in version 0.2.0.
Inserts pandas DataFrame with specified query.
number of inserted rows.
query_dataframe ( query, params=None, external_tables=None, query_id=None, settings=None ) В¶
New in version 0.2.0.
Queries DataFrame with specified SELECT query.
ConnectionВ¶
Represents connection between client and ClickHouse server.
Closes connection between server and client. Frees resources: e.g. closes socket.
QueryResultВ¶
Stores query result from multiple blocks.
get_result ( ) В¶
| Returns: | stored query result. |
|---|
ProgressQueryResultВ¶
Stores query result and progress information from multiple blocks. Provides iteration over query progress.
get_result ( ) В¶
| Returns: | stored query result. |
|---|
IterQueryResultВ¶
Provides iteration over returned data by chunks (streaming by chunks).
Immowelt/PyClickhouse
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Minimalist Clickhouse Python driver with an API roughly resembling Python DB API 2.0 specification.
To develop or run anything in this project, it is recommended to setup a virtual environment using the provided Pipfile:
This will recreate the virtual environment as well, if necessary.
Makefile and running tests
The Makefile target test is provided to run the project’s tests. These require access to a running instance of Clickhouse, which is provided through docker. This assumes that docker is installed and the current user can use it without sudo.
A one-liner to run the tests in the virtual environment would be:
About
Minimalist Clickhouse Python driver with an API roughly resembling Python DB API 2.0 specification.
ClickHouse/clickhouse-connect
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
A suite of Python packages for connecting Python to ClickHouse, initially supporting Apache Superset using a minimal read only SQLAlchemy dialect. Uses the ClickHouse HTTP interface.
ClickHouse Connect requires Python 3.7 or higher. The cython package must be installed prior to installing clickhouse_connect to build and install the optional Cython/C extensions used for improving read and write performance using the ClickHouse Native format. After installing cython if desired, clone this repository and run python setup.py install from the project directory.
Simple ‘command’ that does not return a result set.
Bulk insert of a matrix of rows and columns.
Minimal SQLAlchemy Support
On installation ClickHouse Connect registers the clickhousedb SQLAlchemy Dialect entry point. This dialect supports basic table reflection for table columns and datatypes, and command and query execution using DB API 2.0 cursors. Most ClickHouse datatypes have full query/cursor support.
ClickHouse Connect does not yet implement the full SQLAlchemy API for DDL (Data Definition Language) or ORM (Object Relational Mapping). These features are in development.
On installation ClickHouse Connect registers the clickhousedb Superset Database Engine Spec entry point. Using the clickhousedb SQLAlchemy dialect, the engine spec supports complete data exploration and Superset SQL Lab functionality with all standard ClickHouse data types. See Connecting Superset for complete instructions.
ClickHouse Enum, UUID, and IP Address datatypes are treated as strings. For compatibility with Superset Pandas dataframes, unsigned UInt64 data types are interpreted as signed Int64 values. ClickHouse CSV Upload via SuperSet is not yet implemented.
SQLAlchemy and Superset require the corresponding SQLAlchemy and Apache Superset packages to be included in your Python installation. ClickHouse connect also includes C/Cython extensions for improved performance reading String and FixedString datatypes. These extensions will be installed automatically by setup.py if a C compiler is available.
Query results can be returned as either a numpy array or a pandas DataFrame if the numpy and pandas libraries are available. Use the client methods query_np and query_df respectively.
Main Client Interface
Interaction with the ClickHouse server is done through a clickhouse_connect Client instance. At this point only an HTTP(s) based Client is supported.
HTTP Client constructor/initialization parameters
Create a ClickHouse client using the clickhouse_connect.driver.create_client(. ) function or clickhouse_connect.get_client(. ) wrapper. All parameters are optional:
Any remaining keyword parameters are interpreted as ‘setting’ parameters to send to the ClickHouse server with every query/request
Use the client query method to retrieve a QueryResult from ClickHouse. Parameters:
The query method results a QueryResult object with the following fields:
Numpy and Pandas queries
Datatype options for queries
There are some convenience methods in the clickhouse_connect.driver package that control the format of some ClickHouse datatypes. These are included in part to improve Superset compatibility.
Use the client insert method to insert data into a ClickHouse table. Parameters:
Notes on data inserts
The client insert_df can be used to insert a Pandas DataFrame, assuming the column names in the DataFrame match the ClickHouse table column names. Note that a Numpy array can be passed directly as the data parameter to the primary insert method so there is no separate insert_np method.
For column types that can be different native Python types (for example, UUIDs or IP Addresses), the driver will assume that the data type for the whole column matches the first non «None» value in the column and process insert data accordingly. So if the first data value for insert into a ClickHouse UUID column is a string, the driver will assume all data values in that insert column are strings.
DDL and other «simple» SQL statements
The client command method can be used for ClickHouse commands/queries that return a single result or row of results values. In this case the result is returned as a single row TabSeparated values and are cast to a single string, int, or list of string values. The command method parameters are:
About
Python driver/sqlalchemy/superset connectors
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
FeaturesВ¶
External data for query processingВ¶
You can pass external data alongside with query:
SettingsВ¶
There are a lot of ClickHouse server settings. Settings can be specified during Client initialization:
Each setting can be overridden in an execute statement:
CompressionВ¶
Client with compression support can be constructed as follows:
CityHash algorithm notesВ¶
Unfortunately ClickHouse server comes with built-in old version of CityHash algorithm (1.0.2). That’s why we can’t use original CityHash package. An older version is published separately at PyPI.
Secure connectionВ¶
Specifying query idВ¶
You can manually set query identificator for each query. UUID for example:
You can cancel query with specific id by sending another query with the same query id if option replace_running_query is set to 1.
Query results are fetched by the same instance of Client that emitted query.
Retrieving results in columnar formВ¶
Columnar form sometimes can be more useful.
Data types checking on INSERTВ¶
Data types check is disabled for performance on INSERT queries. You can turn it on by types_check option:
Query execution statisticsВ¶
Client stores statistics about last query execution. It can be obtained by accessing last_query attribute. Statistics is sent from ClickHouse server and calculated on client side. last_query contains info about:
profile: rows before limit
Receiving server logsВ¶
Query logs can be received from server by using send_logs_level setting:
Multiple hostsВ¶
New in version 0.1.3.
This option is good for ClickHouse cluster with multiple replicas.
In example above on every new connection driver will use following sequence of hosts if previous host is unavailable:
All queries within established connection will be sent to the same host.
Python DB API 2.0В¶
New in version 0.1.3.
This driver is also implements DB API 2.0 specification. It can be useful for various integrations.
Threads may share the module and connections.
Connection class is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.
There are some non-standard ClickHouse-related Cursor methods for: external data, settings, etc.
For automatic disposal Connection and Cursor instances can be used as context managers:
Python clickhouse driver
Clickhouse-driver supports Python 3.4 and newer and PyPy.
Starting from version 0.1.0 for building from source gcc, python and linux headers are required.
Example for python:alpine docker image:
By default there are wheels for Linux, Mac OS X and Windows.
Starting from version 0.2.3 there are wheels for musl-based Linux distributions.
These distributions will be installed automatically when installing clickhouse-driver.
These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.
Installation from PyPI
The package can be installed using pip :
You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:
You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:
You can install additional packages (NumPy and Pandas) if you need NumPy support:
NumPy supported versions are limited by numpy package python support.
Installation from github
Development version can be installed directly from github:
long2ice/asynch
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
asynch is an asyncio ClickHouse Python Driver with native (TCP) interface support, which reuse most of clickhouse-driver and comply with PEP249.
Connect to ClickHouse
Create table by sql
Use DictCursor to get result with dict
Insert data with dict
Insert data with tuple
Use connection pool
This project is licensed under the Apache-2.0 License.
About
An asyncio ClickHouse Python Driver with native (TCP) interface support.
Topics
Resources
License
Stars
Watchers
Forks
Releases 10
Packages 0
Contributors 6
Languages
Footer
© 2022 GitHub, Inc.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
aio-clickhouse 0.0.5
pip install aio-clickhouse Copy PIP instructions
Released: Nov 19, 2021
Library for accessing a ClickHouse database over native interface from the asyncio
Navigation
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags ClickHouse, db, database, cloud, analytics, asyncio
Maintainers
Classifiers
Project description
aioch
aioch is a library for accessing a ClickHouse database over native interface from the asyncio. It wraps features of clickhouse-driver for asynchronous usage.
Installation
The package can be installed using pip :
Usage
For more information see clickhouse-driver usage examples.
Parameters
Other parameters are passing to wrapped clickhouse-driver’s Client.
License
aioch is distributed under the MIT license.
Project details
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags ClickHouse, db, database, cloud, analytics, asyncio
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
pavelmaksimov/clickhousepy
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Python wrapper for database queries Clickhouse
The wrapper is done around clickhouse-driver
Written in python version 3.5
Getting Data from Clickhouse in Pandas Dataframe Format
Brief documentation of some methods
Method of copying data from one table to another with checking the number of rows after copying
A method of copying data from one table to another while removing duplicate rows.
You can contact me at Telegram, Facebook
Удачи тебе, друг! Поставь звездочку 😉
About
Python обертка для запросов в БД Clickhouse
Topics
Resources
License
Stars
Watchers
Forks
Releases 7
Packages 0
Contributors 2
Languages
Footer
© 2022 GitHub, Inc.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
maximdanilchenko/aiochclient
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
An async http(s) ClickHouse client for python 3.6+ supporting type conversion in both directions, streaming, lazy decoding on select queries, and a fully typed interface.
Table of Contents
You can use it with either aiohttp or httpx http connectors.
To use with aiohttp install it with command:
Or aiochclient[aiohttp-speedups] to install with extra speedups.
To use with httpx install it with command:
Or aiochclient[httpx-speedups] to install with extra speedups.
Installing with [*-speedups] adds the following:
Additionally the installation process attempts to use Cython for a speed boost (roughly 30% faster).
Connecting to ClickHouse
aiochclient needs aiohttp.ClientSession or httpx.AsyncClient to connect to ClickHouse:
Querying the database
For fetching all rows at once use the fetch method:
For fetching first row from result use the fetchrow method:
You can also use fetchval method, which returns first value of the first row from query result:
With async iteration on the query results stream you can fetch multiple rows without loading them all into memory at once:
Use fetch / fetchrow / fetchval / iterate for SELECT queries and execute or any of last for INSERT and all another queries.
Working with query results
All fetch queries return rows as lightweight, memory efficient objects. Before v 1.0.0 rows were only returned as tuples. All rows have a full mapping interface, where you can get fields by names or indexes:
To check out the api docs, visit the readthedocs site..
aiochclient automatically converts types from ClickHouse to python types and vice-versa.
Connection Pool Settings
aiochclient uses the aiohttp.TCPConnector to determine pool size. By default, the pool limit is 100 open connections.
It’s highly recommended using uvloop and installing aiochclient with speedups for the sake of speed. Some recent benchmarks on our machines without parallelization:
Note: these benchmarks are system dependent
About
Lightweight async http(s) ClickHouse client for python 3.6+ with types converting
mymarilyn/clickhouse-driver
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.rst
ClickHouse Python Driver
ClickHouse Python Driver with native (TCP) interface support.
Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch
There are two ways to communicate with server:
Pure Client example:
ClickHouse Python Driver is distributed under the MIT license.
About
ClickHouse Python Driver with native interface support
Ускоряем драйвер ClickHouse
ClickHouse – самая быстрая в мире аналитическая СУБД. Для тех, кто с ним ещё не знаком, очень рекомендую попробовать, пересаживаться обратно на MySQL или Postgress потом не захочется.
Обычно данные хранятся в ClickHouse в сыром, неагрегированном виде, и агрегируются на лету при выполнении SQL запросов. Но при решении data science задач часто возникает необходимость выгрузки именно сырых данных, для дальнейшей их обработки в памяти (например, для обучения модели по этим данным). Если выгружать данные в текстовый файл с помощью родного клиента ClickHouse, всё происходит достаточно шустро – “ClickHouse не тормозит”™. Но если пользоваться драйвером для Python, то процесс выгрузки затягивается надолго. Почему?
Но Python представляет все числа, как объекты. Это означает, что драйвер проходит по загруженным данным, преобразует каждое число в объект, и потом уже из этих объектов собирает питоновский массив (состоящий из указателей). Такая операция называется boxing, и при больших объемах данных она отнимает значительное время. Собственно, в ходе загрузки данных через Python-драйвер основное занятие CPU это переупаковка чисел из машинного представления в объекты.
В то же время в data science принято работать c numpy массивами (pandas тоже работает через numpy), которые содержат числа в машинном представлении, как в С. То есть, сначала мы долго упаковывали числа в объекты, а потом, при конвертировании из Python массива в numpy массив будем распаковывать объекты обратно в числа (unboxing). Очевидно, что промежуточное объектное представление здесь только мешает, и если бы драйвер умел выгружать данные сразу в numpy массивы, процесс пошёл бы намного бодрее. Но драйвер этого не умеет, поэтому я его немного доработал, чтобы такая возможность появилась.
Инсталляция
Использование
В data будет содержаться набор колонок. Колонки, представляющие собой числа или timestamp, будут numpy-массивами, остальные колонки (например, строки) будут стандартными Python массивами. В numpy формат конвертируются следующие типы Clickhouse: Int8/16/32/64, UInt8/16/32/64, DateTime.
Полученные данные часто преобразуются в pandas DataFrame с именами колонок, соответствующими именам колонок в БД. Чтобы не делать это каждый раз вручную, в класс Client добавлен метод query_dataframe() :
Результатом будет DataFrame с двумя колонками, a и b.
Benchmarks
Замерялась скорость выполнения запроса SELECT x1,x2. xn FROM table на таблице со 100 млн. записей (реальные данные из Logs API Яндекс.Метрики), engine=MergeTree. Запросы выполнялись на локальном ClickHouse c дефолтными настройками драйвера.
| Запрос | Время, numpy | Время, standard | Ускорение | Memory, numpy | Memory, standard |
|---|---|---|---|---|---|
| 4 колонки Int8 | 0.34 s | 5.8 s | ×17 | 0.82 Gb | 3.3 Gb |
| 2 колонки Int64 | 1.38 s | 12 s | ×8.7 | 2.61 Gb | 9.7 Gb |
| 1 колонка DateTime | 12.1 s | 7.1 m | ×35 | 1.16 Gb | 4.8 Gb |
Использование numpy ускоряет чтение на порядок. Особенно заметно ускорение на типе DateTime, потому что работа c временем на уровне Питоновских datetime-объектов происходит очень медленно. Фактически, без использования numpy время выполнения запроса, включающего колонку со временем, выходит за рамки разумного.
В последних двух колонках – объём памяти, занимаемый процессом после выполнения запроса. Видно, что использование numpy не только ускоряет загрузку данных, но и уменьшает объём требуемой памяти примерно в 4 раза.
Ограничения
Ограничения на чтение никак не мешают функционированию драйвера, просто для некоторых типов данных чтение ускоряется, а для некоторых – работает, как обычно.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
This part of the documentation covers basic classes of the driver: Client, Connection and others.
ClientВ¶
Client for communication with the ClickHouse server. Single connection is established per each connected instance of the client.
| Parameters: | settings – Dictionary of settings that passed to every query. Defaults to None (no additional settings). See all available settings in ClickHouse docs. |
|---|
Disconnects from the server.
execute ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶
Establishes new connection if it wasn’t established yet. After query execution connection remains intact for next queries. If connection can’t be reused it will be closed and new connection will be created.
execute_iter ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶
New in version 0.0.14.
execute_with_progress ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶
classmethod from_url ( url ) В¶
Return a client configured from the given URL.
Any additional querystring arguments will be passed along to the Connection class’s initializer.
ConnectionВ¶
Represents connection between client and ClickHouse server.
Closes connection between server and client. Frees resources: e.g. closes socket.
QueryResultВ¶
Stores query result from multiple blocks.
get_result ( ) В¶
| Returns: | stored query result. |
|---|
ProgressQueryResultВ¶
Stores query result and progress information from multiple blocks. Provides iteration over query progress.
get_result ( ) В¶
| Returns: | stored query result. |
|---|
IterQueryResultВ¶
Provides iteration over returned data by chunks (streaming by chunks).
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
FeaturesВ¶
External data for query processingВ¶
You can pass external data alongside with query:
SettingsВ¶
There are a lot of ClickHouse server settings. Settings can be specified during Client initialization:
Each setting can be overridden in an execute statement:
CompressionВ¶
Client with compression support can be constructed as follows:
CityHash algorithm notesВ¶
Unfortunately ClickHouse server comes with built-in old version of CityHash algorithm (1.0.2). That’s why we can’t use original CityHash package. An older version is published separately at PyPI.
Secure connectionВ¶
Specifying query idВ¶
You can manually set query identificator for each query. UUID for example:
You can cancel query with specific id by sending another query with the same query id if option replace_running_query is set to 1.
Query results are fetched by the same instance of Client that emitted query.
Retrieving results in columnar formВ¶
Columnar form sometimes can be more useful.
Data types checking on INSERTВ¶
Data types check is disabled for performance on INSERT queries. You can turn it on by types_check option:
Query execution statisticsВ¶
Client stores statistics about last query execution. It can be obtained by accessing last_query attribute. Statistics is sent from ClickHouse server and calculated on client side. last_query contains info about:
profile: rows before limit
Receiving server logsВ¶
Query logs can be received from server by using send_logs_level setting:
Multiple hostsВ¶
New in version 0.1.3.
This option is good for ClickHouse cluster with multiple replicas.
In example above on every new connection driver will use following sequence of hosts if previous host is unavailable:
All queries within established connection will be sent to the same host.
Python DB API 2.0В¶
New in version 0.1.3.
This driver is also implements DB API 2.0 specification. It can be useful for various integrations.
Threads may share the module and connections.
Connection class is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.
There are some non-standard ClickHouse-related Cursor methods for: external data, settings, etc.
For automatic disposal Connection and Cursor instances can be used as context managers:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
This part of the documentation covers basic classes of the driver: Client, Connection and others.
ClientВ¶
Client for communication with the ClickHouse server. Single connection is established per each connected instance of the client.
The following keys when passed in settings are used for configuring the client itself:
Disconnects from the server.
execute ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶
Establishes new connection if it wasn’t established yet. After query execution connection remains intact for next queries. If connection can’t be reused it will be closed and new connection will be created.
execute_iter ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶
New in version 0.0.14.
execute_with_progress ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶
classmethod from_url ( url ) В¶
Return a client configured from the given URL.
Any additional querystring arguments will be passed along to the Connection class’s initializer.
insert_dataframe ( query, dataframe, transpose=True, external_tables=None, query_id=None, settings=None ) В¶
New in version 0.2.0.
Inserts pandas DataFrame with specified query.
number of inserted rows.
query_dataframe ( query, params=None, external_tables=None, query_id=None, settings=None ) В¶
New in version 0.2.0.
Queries DataFrame with specified SELECT query.
ConnectionВ¶
Represents connection between client and ClickHouse server.
Closes connection between server and client. Frees resources: e.g. closes socket.
QueryResultВ¶
Stores query result from multiple blocks.
get_result ( ) В¶
| Returns: | stored query result. |
|---|
ProgressQueryResultВ¶
Stores query result and progress information from multiple blocks. Provides iteration over query progress.
get_result ( ) В¶
| Returns: | stored query result. |
|---|
IterQueryResultВ¶
Provides iteration over returned data by chunks (streaming by chunks).
mymarilyn / clickhouse-driver Goto Github PK
ClickHouse Python Driver with native interface support
clickhouse-driver’s Introduction
ClickHouse Python Driver
ClickHouse Python Driver with native (TCP) interface support.
Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch
There are two ways to communicate with server:
Pure Client example:
ClickHouse Python Driver is distributed under the MIT license.
clickhouse-driver’s People
Contributors
Stargazers
Watchers
Forkers
clickhouse-driver’s Issues
Support Interval Types
Recent Clickhouse supports intervals / timedeltas but they have special types.
input_format_skip_unknown_fields setting seems to have no effect
According to the Clickhouse documentation, an exception should be raised if input_format_skip_unknown_fields is set to false
I can reproduce this behavior as expected with clickhouse-client, but clickhouse-driver seems to ignore this setting. In the example below, «z» is not part of the schema.
clickhouse-driver==0.0.10
ClickHouse server version 1.1.54362
output: (no exception rasied in either case)
Conda forge feedstock
I’m working on clickhouse backend for ibis.
Ibis is installable from both pip and conda. I’d like to use clickhouse-driver, but currently clickhouse-driver and clickhouse-cityhash don’t have conda packages.
I’ve already created the recipes, but conda forge packages require maintainers. Would You please create the feedstocks?
DBAPI Support
Thanks for the hard working on this great project.
Does this driver already implement the DBAPI? If not, do we have a plan?
Wrong DateTime insert
After insert datetime.datetime(2018, 1, 19, 10) through this driver I see ‘2018-01-19 13:00:00’ value in table.
Timezone on my computer and clickhouse server is Moskow.
What I must do to see ‘2018-01-19 10:00:00’ after insert?
Pandas interop
@xzkostyan would You like to include pandas support?
Doesn’t work with ipv6-only hosts
SELECT INTO OUTFILE
current version of driver ignore select into outfile?
there are no difference between queries
select * from log LIMIT 10000 INTO OUTFILE ‘/var/tmp/test123.csv’ FORMAT TabSeparated
and
select * from log LIMIT 10000
clickhouse-driver version 0.0.16
Feature ‘format JSON’ needed
Can you provide ‘format JSON’ as the clickhouse-client does?
Extended support of IPv4 and IPv6 column types
I am afraid to come with a form of specific request here. I opened this issue last week on Yandex/clickhouse repository: ClickHouse/ClickHouse#2605. It was about issues with support of specific column and types to store IPv4 and IPv6 data.
I didn’t get any sort of positive answer from them. at least short term.
I was wondering if it could make sense to develop a form of additional types in your code such that column named IPv4_. or IPv6_. benefits from a specific behaviour. the idea would be to apply conversion function in you code and introduce those column as type INET (similar to postGRESQL) : https://www.postgresql.org/docs/9.1/static/datatype-net-types.html.
By having this feature, This type could be handled by upper layers like your SQLAlchemy driver and related UI and Frontend.
Memory Overflow
I’m running Clickhouse on production, using the asyncio wrapper. But sometimes I get a issue when inserting into database.
I don’t know if the issue is the server or the client.
execute wait long time
[[email protected] tmp]# python
Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type «help», «copyright», «credits» or «license» for more information.
from clickhouse_driver import Client
client = Client(‘192.168.133.2′,18123,’client_report’,’admin’,’************’,insert_block_size=1)
client.execute(‘SHOW TABLES’)
^CTraceback (most recent call last):
File «», line 1, in
File «/usr/lib/python2.6/site-packages/clickhouse_driver/client.py», line 159, in execute
self.connection.force_connect()
File «/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py», line 122, in force_connect
self.connect()
File «/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py», line 188, in connect
self.receive_hello()
File «/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py», line 263, in receive_hello
packet_type = read_varint(self.fin)
File «/usr/lib/python2.6/site-packages/clickhouse_driver/reader.py», line 29, in read_varint
i = _read_one(f)
File «/usr/lib/python2.6/site-packages/clickhouse_driver/reader.py», line 14, in _read_one
c = f.read(1)
File «/usr/lib64/python2.6/socket.py», line 383, in read
data = self._sock.recv(left)
KeyboardInterrupt
Extremely slow on large select, http protocol almost 10 times faster
It seems that selecting large datasets using the native client is extremely slow. Here is my benchmark https://gist.github.com/dmitriyshashkin/6a4849bdcf882ba340cdfbc1990da401
Initially, I’ve encountered this behavior on my own dataset, but I was able to reproduce it using the dataset and the structure described here https://clickhouse.yandex/docs/en/getting_started/example_datasets/ontime/
To simplify things a little bit I’ve used the data for just one month: http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_2017_12.zip
As you can see the fastest way to get the data is by using HTTP protocol with requests and pandas. The problem gets worse as the number of the rows grows, on my own dataset with 5M rows I waited for 1 hour before I had to interrupt the process. The bottleneck is not CH itself, the «top» command shows that all the work is done by python with 100% CPU utilization, while CH is almost idle.
Set query settings
If a setting is set to a value different from the default one, then it should be sent to the server.
It could be done in connection.send_query() before write_binary_str(», self.fout) # end of query settings
Better support for writing bytes
insert data into Date column face error
when i try insert into Datetime format column, it’s success.
how to fix this problem? thanks.
Timeout error
I’m getting a timeout error when trying to fetch results from clickhouse using this package on windows.
Any ideas how I can fix it?
Ps.
I checked this connection credentials via jdbc-driver
Error if enum key is blank
How to reproduce
Create table with enum field and insert a few rows
Try to select with python driver:
Links related issues
Do you need to explicitly close the connection using clickhouse_driver?
Set settings.limits
Multithreaded client
I’ve had some issues with using clickhouse_driver in a multithreaded asyncio environment. The connection seems not to be thread safe. I’m not sure that is the best approach to solve the issues I’ve encountered, but here is a pooled connection implementation I am using:
column/string value may be None NoneType’ object has no attribute ‘encode’
column/string value may be None
def try_encode(self, value):
if not isinstance(value, bytes):
return value.encode(‘utf-8’)
return value
Unknown type Tuple(Float64, Float64)
Can’t execute query with column of type Array(Tuple(Float64, Float64)) as result.
/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in read_column(context, column_spec, n_items, buf) 65 def read_column(context, column_spec, n_items, buf): 66 column_options = <'context': context>—> 67 column = get_column_by_spec(column_spec, column_options=column_options) 68 return column.read_data(n_items, buf) 69
Accessing rows_before_limit property through API
The BlockStreamProfileInfo.rows_before_limit property is useful to get the rows count for pagination without running an extra query, but there does not seem to be any way to access it through the API (i.e. the client silently ignores the PROFILE_INFO packet).
As a workaround I hacked together a small change in Client.receive_packet where it saves the last packet.profile_info in the Client instance and we can use it like this:
So it’s not pretty, but it seems to work fine. Anyway, I think it would make sense to have it in the main API without additional hacks, but I’m not sure what’s the best place to put it in. Do you have any thoughts on that? Or perhaps are there any additional caveats that might have caused leaving this out of the API scope?
Error writing string while using supserset and clickhouse driver (text encode not set to UTF-8)
Something new is happening while adding a clickhouse db to supserset at the time superset is trying to get table meta data.
Get progress info
It seems the Progress packets are received and managed but there is no way to get the info from the Client or Connection objects. Here an API proposition with a fetch* method, this is common on a database API.
AttributeError when receiving ServerPacketTypes.PROFILE_INFO
After updating to 0.0.16 the following error appears:
AttributeError: ‘Client’ object has no attribute ‘execute_iter’
looks like there is no execute_iter method in version ‘0.0.10’.
from clickhouse_driver import Client
client = Client(host=’localhost’, port=2102, database=’shard01′)
rows_gen = client.execute_iter(‘select * from query.TCC_S1’, settings=settings)
AttributeError Traceback (most recent call last)
in ()
2 client = Client(host=’localhost’, port=2102, database=’shard01′)
3 settings = <'max_block_size': 100000>
—-> 4 rows_gen = client.execute_iter(‘select * from query.TCC_S1’, settings=settings)
AttributeError: ‘Client’ object has no attribute ‘execute_iter’
SQL injections?
Thanks for this great library. I was thinking whether it has any protection against SQL injection? https://github.com/mymarilyn/clickhouse-driver/blob/master/src/util/escape.py does not seem to have any checks for injected queries.
Iterator support
Is there any way that the return of a SELECT query would be a row iterator instead of loading the whole query result into memory. Thank you!
Insert NULL values for Nullable types
Hi, I have created table with Nullable columns. How to pass NULL values via bulk insert?
It seems there are no analogue value for NULL, None is not working atm.
Actually None is working and it inserts NULL values 🙂
TooLargeStringSize: Code: 131.
I often get the error TooLargeStringSize: Code: 131. on inserting data into a table. How I can prevent it? I already tried to insert really small batches.
Meta/Column names from query
Meta/Column names. how do I get them?
how to get column names with select
Is not working under CentOS 7 after install from pip
Hi,
I’m experiencing some strange issues with module and there are no other issues with any other modules:
Installed from pip without issues:
Example code test.py
When using domain instead of IP address, the DNS resolve don’t change.
I use domain instead of IP address, in order to do load balance and failover.
But when I change the domain mapping, the script still continuing connect to the origin IP address.
For example, I use domain ‘xxx.test.com’ which mapping ip1, and ip2, and start the script, the data will write into ip1 and ip2 round-robin.
After the mapping of domain to ip2 and ip3, that means replace ip1 with ip3, the script still continues to write to ip1.
Big INSERT ends in timeout
I have an issue with a big insert query > 2000 cols.
Finally it runs with native clickhouse client.
After some config changes runs in 0.5 seconds
Now I tried to run the same query with the clickhouse-driver.
It times out.
Before I had the same script working with pymysql.
Any idea how to fix this?
Traceback (most recent call last):
File «Click_v01.py», line 154, in
client.execute(sql)
File «/usr/lib/python3.4/site-packages/clickhouse_driver/client.py», line 73, in execute
query_id=query_id, settings=settings
File «/usr/lib/python3.4/site-packages/clickhouse_driver/client.py», line 88, in process_ordinary_query
return self.receive_result(with_column_types=with_column_types)
File «/usr/lib/python3.4/site-packages/clickhouse_driver/client.py», line 19, in receive_result
block = self.receive_block()
File «/usr/lib/python3.4/site-packages/clickhouse_driver/client.py», line 38, in receive_block
packet = self.connection.receive_packet()
File «/usr/lib/python3.4/site-packages/clickhouse_driver/connection.py», line 234, in receive_packet
packet.type = packet_type = read_varint(self.fin)
File «/usr/lib/python3.4/site-packages/clickhouse_driver/reader.py», line 46, in read_varint
i = _read_one(f)
File «/usr/lib/python3.4/site-packages/clickhouse_driver/reader.py», line 31, in _read_one
c = f.read(1)
File «/usr/lib64/python3.4/socket.py», line 378, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
Nothing type
Recent clickhouse has a new type Nothing which is currently unhandled by clickhouse-driver.
Selecting Null raises:
Return column type spec
Currently the client returns typenames.
It would be great to get the typespec instead, including nullable flag and/or inner type spec.
clickhouse-driver==0.0.10
ClickHouse server version 1.1.54362
It is also possible to omit values in which case the default value of the column is inserted.
However, clickhouse-driver instead throws an exception:
Working example using clickhouse-client:
I was able to get past the KeyError in the Traceback above by changing
I get to the point of sending the data to the server but get the following error, which I unfortunately don’t have the time to dig into right now to come up with a possible PR/solution:
This would be a great feature so that JSON objects don’t always have to be fully specified in the client code!
Dates are lower by 1
date.fromtimestamp is supposed to take a local timestamp not a UTC one. I am in a timezone that is behind UTC, so all dates are 1 day behind.
To fix this you need to use datetime.datetime.utcfromtimestamp().date() instead.
return datetime.datetime.utcfromtimestamp(value * self.offset).date()
Insert Buffer?
The documentation recommends to insert chunks of at least 1000 items at once. Is there an easy way to add some sort of buffer to gather inserts and thus enhance to performance? Of course, one could use a simple list and append to it/insert if it reaches a specific size, but things get more complicated when someone uses multiprocessing etc.
Asyncio client
First of all, great to have a native driver!
Nested datatype
Tuple support
I can’t return unique combinations of two columns (e.g. src and dst ).
Executing a query with SELECT DISTINCT(field1, field2) causes the following error when returning data from Clickhouse:
Set a query ID
Clickhouse allows to set an ID on queries, you can see this IDs with a:
this is useful when you want to cancel a query from an external process as explain here: https://stackoverflow.com/questions/40546983/how-to-kill-a-process-query-in-clickhouse
I propose that the connection.send_query() method accepts a string argument to name the query. It could be used in client.process_*_query() functions and execute() as an optional argument:
Column names
client.execute(‘SELECT * FROM test3’)
client.execute(‘SELECT a, b, c FROM test3’)
and I will get list of tuples (one tuple for each row), but I don’t have column names.
Is it somehow possible to get all columns names (named tuple, or something) with the data itself, in order to simplify further data manipulation?
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
FeaturesВ¶
External data for query processingВ¶
You can pass external data alongside with query:
SettingsВ¶
There are a lot of ClickHouse server settings. Settings can be specified during Client initialization:
Each setting can be overridden in an execute statement:
CompressionВ¶
Client with compression support can be constructed as follows:
CityHash algorithm notesВ¶
Unfortunately ClickHouse server comes with built-in old version of CityHash algorithm (1.0.2). That’s why we can’t use original CityHash package. An older version is published separately at PyPI.
Secure connectionВ¶
Specifying query idВ¶
You can manually set query identificator for each query. UUID for example:
You can cancel query with specific id by sending another query with the same query id if option replace_running_query is set to 1.
Query results are fetched by the same instance of Client that emitted query.
Retrieving results in columnar formВ¶
Columnar form sometimes can be more useful.
Data types checking on INSERTВ¶
Data types check is disabled for performance on INSERT queries. You can turn it on by types_check option:
Query execution statisticsВ¶
Client stores statistics about last query execution. It can be obtained by accessing last_query attribute. Statistics is sent from ClickHouse server and calculated on client side. last_query contains info about:
profile: rows before limit
Receiving server logsВ¶
Query logs can be received from server by using send_logs_level setting:
Multiple hostsВ¶
New in version 0.1.3.
This option is good for ClickHouse cluster with multiple replicas.
In example above on every new connection driver will use following sequence of hosts if previous host is unavailable:
All queries within established connection will be sent to the same host.
Python DB API 2.0В¶
New in version 0.1.3.
This driver is also implements DB API 2.0 specification. It can be useful for various integrations.
Threads may share the module and connections.
Connection class is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.
There are some non-standard ClickHouse-related Cursor methods for: external data, settings, etc.
For automatic disposal Connection and Cursor instances can be used as context managers:
NumPy arrays supportВ¶
New in version 0.1.6.
NumPy arrays are not used when reading nullable columns and columns of unsupported types.
Direct loading into NumPy arrays increases performance and lowers memory requirements on large amounts of rows.
Direct loading into pandas DataFrame is also supported by using query_dataframe :
Writing pandas DataFrame is also supported with insert_dataframe :
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
QuickstartВ¶
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Selecting dataВ¶
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Selecting data with progress statisticsВ¶
Streaming resultsВ¶
When you are dealing with large datasets block by block results streaming may be useful:
Inserting dataВ¶
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
Of course for INSERT … SELECT queries data is not needed:
ClickHouse will execute this query like a usual SELECT query.
DDL queries can be executed in the same way SELECT queries are executed:
Asynchronous behaviorВ¶
Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool. To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
QuickstartВ¶
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Selecting dataВ¶
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Customisation SELECT output with FORMAT clause is not supported.
Selecting data with progress statisticsВ¶
Streaming resultsВ¶
When you are dealing with large datasets block by block results streaming may be useful:
Inserting dataВ¶
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
ClickHouse will execute this query like a usual SELECT query.
Inserting data in different formats with FORMAT clause is not supported.
See Inserting data from CSV file if you need to data in custom format.
DDL queries can be executed in the same way SELECT queries are executed:
Async and multithreadingВ¶
Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.
To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.
The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.
However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.
aiochclient 2.2.0
pip install aiochclient Copy PIP instructions
Released: Aug 18, 2022
Async http clickhouse client for python 3.6+
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags clickhouse, async, python, aiohttp
Maintainers
Classifiers
Project description
aiochclient
An async http(s) ClickHouse client for python 3.6+ supporting type conversion in both directions, streaming, lazy decoding on select queries, and a fully typed interface.
Table of Contents
Installation
You can use it with either aiohttp or httpx http connectors.
To use with aiohttp install it with command:
Or aiochclient[aiohttp-speedups] to install with extra speedups.
To use with httpx install it with command:
Or aiochclient[httpx-speedups] to install with extra speedups.
Installing with [*-speedups] adds the following:
Additionally the installation process attempts to use Cython for a speed boost (roughly 30% faster).
Quick Start
Connecting to ClickHouse
aiochclient needs aiohttp.ClientSession or httpx.AsyncClient to connect to ClickHouse:
Querying the database
For fetching all rows at once use the fetch method:
For fetching first row from result use the fetchrow method:
You can also use fetchval method, which returns first value of the first row from query result:
With async iteration on the query results stream you can fetch multiple rows without loading them all into memory at once:
Use fetch / fetchrow / fetchval / iterate for SELECT queries and execute or any of last for INSERT and all another queries.
Working with query results
All fetch queries return rows as lightweight, memory efficient objects. Before v 1.0.0 rows were only returned as tuples. All rows have a full mapping interface, where you can get fields by names or indexes:
Documentation
To check out the api docs, visit the readthedocs site..
Type Conversion
aiochclient automatically converts types from ClickHouse to python types and vice-versa.
Connection Pool Settings
aiochclient uses the aiohttp.TCPConnector to determine pool size. By default, the pool limit is 100 open connections.
Notes on Speed
It’s highly recommended using uvloop and installing aiochclient with speedups for the sake of speed. Some recent benchmarks on our machines without parallelization:
Note: these benchmarks are system dependent
airflow-clickhouse-plugin 0.8.2
pip install airflow-clickhouse-plugin Copy PIP instructions
Released: Jun 11, 2022
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License
Tags clickhouse, airflow
Requires: Python >=3.6.*
Maintainers
Classifiers
Project description
Airflow ClickHouse Plugin
Features
Installation and dependencies
Requires apache-airflow and clickhouse-driver (installed automatically by pip ). Primarily supports Airflow 2.0–2.3. Later versions are expected to work properly but may be not fully tested. Use plugin versions below 0.6.0 (e.g. 0.5.7.post1) to preserve compatibility with Airflow 1.10.6 (this version has long-term support on Google Cloud Composer).
Note on pandas dependency
Usage
ClickHouseOperator Reference
To import ClickHouseOperator use: from airflow_clickhouse_plugin.operators.clickhouse_operator import ClickHouseOperator
The result of the last query is pushed to XCom.
ClickHouseHook Reference
To import ClickHouseHook use: from airflow_clickhouse_plugin.hooks.clickhouse_hook import ClickHouseHook
Supported kwargs of constructor ( __init__ method):
Supports all the methods of the Airflow BaseHook including:
ClickHouseSqlSensor Reference
Sensor fully inherits from Airflow SQLSensor and therefore fully implements its interface using ClickHouseHook to fetch the SQL execution result and supports templating of sql argument.
ClickHouse Connection schema
clickhouse_driver.Client is initiated with attributes stored in Airflow Connection attributes. The mapping of the attributes is listed below:
| Airflow Connection attribute | Client.__init__ argument |
|---|---|
| host | host |
| port | port |
| schema | database |
| login | user |
| password | password |
If you pass database argument to ClickHouseOperator or ClickHouseHook explicitly then it is passed to the Client instead of the schema attribute of the Airflow connection.
Extra arguments
For example, if Airflow connection contains extra= <"secure":true>then the Client.__init__ will receive secure=True keyword argument in addition to other non-empty connection attributes.
Default values
If the Airflow connection attribute is not set then it is not passed to the Client at all. In that case the default value of the corresponding clickhouse_driver.Connection argument is used (e.g. user defaults to ‘default’ ).
This means that Airflow ClickHouse Plugin does not itself define any default values for the ClickHouse connection. You may fully rely on default values of the clickhouse-driver version you use. The only exception is host : if the attribute of Airflow connection is not set then ‘localhost’ is used.
Default connection
Examples
ClickHouseOperator Example
ClickHouseHook Example
Important note: don’t try to insert values using ch_hook.run(‘INSERT INTO some_ch_table VALUES (1)’) literal form. clickhouse-driver requires values for INSERT query to be provided via parameters due to specifics of the native ClickHouse protocol.
ClickHouseSqlSensor Example
How to run tests
Unit tests
Integration tests
Integration tests require access to ClickHouse server. Tests use connection URI defined via environment variable AIRFLOW_CONN_CLICKHOUSE_DEFAULT with clickhouse://localhost as default.
All tests
Github Actions
Github Action is set up for this project.
Run tests using Docker
Run ClickHouse server inside Docker:
The above command will open bash inside the container.
Install dependencies into container and run tests (execute inside container):
How to upload to PyPI
Run tests for test PyPI version:
Pandas test may fail.
Test public PyPI (run clickhouse container), with pandas:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
QuickstartВ¶
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Selecting dataВ¶
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Selecting data with progress statisticsВ¶
Streaming resultsВ¶
When you are dealing with large datasets block by block results streaming may be useful:
Inserting dataВ¶
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
Of course for INSERT … SELECT queries data is not needed:
ClickHouse will execute this query like a usual SELECT query.
DDL queries can be executed in the same way SELECT queries are executed:
Async and multithreadingВ¶
Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.
To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.
The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.
However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.
madiedinro/simple-clickhouse
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Simple ClickHouse lib
Install using pip from pypi repository
Or latest version from git
При использовании в Rockstat, параметры указывать не требуется. Они подставляются автоматически из переменных окружения.
Selecting without decoding
Selecting as dict’s steam
Disabling decoding for streaming data
Чтобы получить результат в виде строки воспользуйтесь bytes_decoder
Executing sql statements
Для для записи данных, управления БД и других операция (не select) слудует использовать метод run
Можно использовать для «ручной» записи данных
Microbatch writing using context manager
new
On exit context all data will be flushed.
Old manual conrolled mechanic.
Some Simpe Magick
To create instance of TableDiscovery call
One of records or columns should be filled.
Detect using present data
Next times after use table auto discovery you shoud use fixed layout. To to this easy try TableDiscovery.pycode()
will be returned
Correct detected / implicit set data-types
TableDiscovery.int(*args) set columnts to int
Set date columns
Set date column
Set str columns
Set strinmg column
Set primary key columns
Set metrics
other marked as dimensions
Set dimensions
other marked as metrics
Print table create statement / execute query
Difference handling. Be careful currently it Proof of concept
All records will be flushed to DB on context exit
Выполнение запроса и чтение всего результата сразу
Получение записей потоком
Выполнение SQL операций
all data will be flushed on exit context
The MIT License (MIT)
Copyright (c) 2018-2019 Dmitry Rodin
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the «Software»), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED «AS IS», WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
asynch 0.1.9
pip install asynch Copy PIP instructions
Released: Jun 2, 2021
A asyncio driver for ClickHouse with native tcp protocol
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: Apache Software License (Apache-2.0)
Author: long2ice
Tags ClickHouse, asyncio, driver
Requires: Python >=3.7, long2ice
Classifiers
Project description
asynch
Introduction
asynch is an asyncio ClickHouse Python Driver with native (TCP) interface support, which reuse most of clickhouse-driver and comply with PEP249.
Install
Usage
Connect to ClickHouse
Create table by sql
Use DictCursor to get result with dict
Insert data with dict
Insert data with tuple
Use connection pool
ThanksTo
License
This project is licensed under the Apache-2.0 License.
Project details
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: Apache Software License (Apache-2.0)
Author: long2ice
Tags ClickHouse, asyncio, driver
Requires: Python >=3.7, long2ice
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
QuickstartВ¶
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Selecting dataВ¶
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Selecting data with progress statisticsВ¶
Streaming resultsВ¶
When you are dealing with large datasets block by block results streaming may be useful:
Inserting dataВ¶
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
Of course for INSERT … SELECT queries data is not needed:
ClickHouse will execute this query like a usual SELECT query.
DDL queries can be executed in the same way SELECT queries are executed:
Asynchronous behaviorВ¶
Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool. To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.
clickhouse-sqlalchemy 0.2.2
pip install clickhouse-sqlalchemy Copy PIP instructions
Released: Aug 24, 2022
Simple ClickHouse SQLAlchemy Dialect
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags ClickHouse, db, database, cloud, analytics
Requires: Python >=3.6, xzkostyan
Classifiers
Project description
ClickHouse SQLAlchemy
ClickHouse dialect for SQLAlchemy to ClickHouse database.
Documentation
Usage
native [recommended] (TCP) via clickhouse-driver
http via requests
Insert some data
And query inserted data
License
ClickHouse SQLAlchemy is distributed under the MIT license.
Project details
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags ClickHouse, db, database, cloud, analytics
Requires: Python >=3.6, xzkostyan
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
PerformanceВ¶
This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.
clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.
When you read data over HTTP you may need to cast strings into Python types.
Test dataВ¶
Sample data for testing is taken from ClickHouse docs.
Create database and table:
Download some data for 2017 year:
Insert data into ClickHouse:
Required packagesВ¶
For fast json parsing we’ll use ujson package:
VersionsВ¶
Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]
BenchmarkingВ¶
Scripts below can be benchmarked with following one-liner:
Time will measure:
Plain text without parsingВ¶
Let’s take get plain text response from ClickHouse server as baseline.
Fetching not parsed data with pure requests (1)
Parsed rowsВ¶
Line split into elements will be consider as “parsed” for TSV format (2)
Now we cast each element to it’s data type (2.5)
JSONEachRow format can be loaded with json loads (3)
Get fully parsed rows with clickhouse-driver in Native format (4)
Iteration over rowsВ¶
Iteration over TSV (5)
Now we cast each element to it’s data type (5.5)
Iteration over JSONEachRow (6)
Iteration over rows with clickhouse-driver in Native format (7)
Iteration over string rowsВ¶
OK, but what if we need only string columns?
Iteration over TSV (8)
Iteration over JSONEachRow (9)
Iteration over string rows with clickhouse-driver in Native format (10)
Iteration over int rowsВ¶
Iteration over TSV (11)
Iteration over JSONEachRow (12)
Iteration over int rows with clickhouse-driver in Native format (13)
ResultsВ¶
This table contains memory and timing benchmark results of snippets above.
JSON in table is shorthand for JSONEachRow.
| Rows | |||||
|---|---|---|---|---|---|
| 50k | 131k | 217k | 450k | 697k | |
| Plain text without parsing: timing | |||||
| Naive requests.get TSV (1) | 0.40 s | 0.67 s | 0.95 s | 1.67 s | 2.52 s |
| Naive requests.get JSON (1) | 0.61 s | 1.23 s | 2.09 s | 3.52 s | 5.20 s |
| Plain text without parsing: memory | |||||
| Naive requests.get TSV (1) | 49 MB | 107 MB | 165 MB | 322 MB | 488 MB |
| Naive requests.get JSON (1) | 206 MB | 564 MB | 916 MB | 1.83 GB | 2.83 GB |
| Parsed rows: timing | |||||
| requests.get TSV (2) | 0.81 s | 1.81 s | 3.09 s | 7.22 s | 11.87 s |
| requests.get TSV with cast (2.5) | 1.78 s | 4.58 s | 7.42 s | 16.12 s | 25.52 s |
| requests.get JSON (3) | 2.14 s | 5.65 s | 9.20 s | 20.43 s | 31.72 s |
| clickhouse-driver Native (4) | 0.73 s | 1.40 s | 2.08 s | 4.03 s | 6.20 s |
| Parsed rows: memory | |||||
| requests.get TSV (2) | 171 MB | 462 MB | 753 MB | 1.51 GB | 2.33 GB |
| requests.get TSV with cast (2.5) | 135 MB | 356 MB | 576 MB | 1.15 GB | 1.78 GB |
| requests.get JSON (3) | 139 MB | 366 MB | 591 MB | 1.18 GB | 1.82 GB |
| clickhouse-driver Native (4) | 135 MB | 337 MB | 535 MB | 1.05 GB | 1.62 GB |
| Iteration over rows: timing | |||||
| requests.get TSV (5) | 0.49 s | 0.99 s | 1.34 s | 2.58 s | 4.00 s |
| requests.get TSV with cast (5.5) | 1.38 s | 3.38 s | 5.40 s | 10.89 s | 16.59 s |
| requests.get JSON (6) | 1.89 s | 4.73 s | 7.63 s | 15.63 s | 24.60 s |
| clickhouse-driver Native (7) | 0.62 s | 1.28 s | 1.93 s | 3.68 s | 5.54 s |
| Iteration over rows: memory | |||||
| requests.get TSV (5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get TSV with cast (5.5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (6) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (7) | 56 MB | 70 MB | 71 MB | 71 MB | 71 MB |
| Iteration over string rows: timing | |||||
| requests.get TSV (8) | 0.40 s | 0.67 s | 0.80 s | 1.55 s | 2.18 s |
| requests.get JSON (9) | 1.14 s | 2.64 s | 4.22 s | 8.48 s | 12.96 s |
| clickhouse-driver Native (10) | 0.46 s | 0.91 s | 1.35 s | 2.49 s | 3.67 s |
| Iteration over string rows: memory | |||||
| requests.get TSV (8) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (9) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (10) | 46 MB | 56 MB | 57 MB | 57 MB | 57 MB |
| Iteration over int rows: timing | |||||
| requests.get TSV (11) | 0.84 s | 2.06 s | 3.22 s | 6.27 s | 10.06 s |
| requests.get JSON (12) | 0.95 s | 2.15 s | 3.55 s | 6.93 s | 10.82 s |
| clickhouse-driver Native (13) | 0.43 s | 0.61 s | 0.86 s | 1.53 s | 2.27 s |
| Iteration over int rows: memory | |||||
| requests.get TSV (11) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (12) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (13) | 41 MB | 48 MB | 48 MB | 48 MB | 49 MB |
ConclusionВ¶
If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.
It doesn’t matter which interface to use if you manipulate small amount of rows.
ClickSQL 0.1.9.4
pip install ClickSQL Copy PIP instructions
Released: Nov 24, 2021
A python client for Clickhouse
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT Licence
Author: sn0wfree
Tags ClickHouse, Databases, SQL, Python, Client
Maintainers
Project description
ClickSQL: ClickHouse client for Humans
ClickSQL is a python client for ClickHouse database, which may help users to use ClickHouse more easier and pythonic. More information for ClickHouse can be found at here
Installation
pip install ClickSQL
Usage
Initial connection
to setup a database connection and send a heartbeat-check signal
Query
execute a SQL Query
execute a Query without SQL
Insert data
insert data into database by various ways
Insert data via DataFrame
Insert data via SQL(Inner)
Create table
Create table by SQL
Create table by DataFrame
Contribution
Welcome to improve this package or submit an issue or any others
Author
Available functions or properties
In Process
schedule
Project details
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT Licence
Author: sn0wfree
Tags ClickHouse, Databases, SQL, Python, Client
Python clickhouse driver
Each ClickHouse type is deserialized to a corresponding Python type when SELECT queries are prepared. When serializing INSERT queries, clickhouse-driver accepts a broader range of Python types. The following ClickHouse types are supported by clickhouse-driver:
Date32 support is new in version 0.2.2.
Timezone support is new in version 0.0.11. DateTime64 support is new in version 0.1.3.
Integers are interpreted as seconds without timezone (UNIX timestamps). Integers can be used when insertion of datetime column is a bottleneck.
Setting use_client_time_zone is taken into consideration.
You can cast DateTime column to integers if you are facing performance issues when selecting large amount of rows.
Due to Python’s current limitations minimal DateTime64 resolution is one microsecond.
String column is encoded/decoded with encoding specified by strings_encoding setting. Default encoding is UTF-8.
You can specify custom encoding:
Encoding is applied to all string fields in query.
String columns can be returned without any decoding. In this case return values are bytes:
If a column has FixedString type, upon returning from SELECT it may contain trailing zeroes in accordance with ClickHouse’s storage format. Trailing zeroes are stripped by driver for convenience.
During INSERT, if strings_as_bytes setting is not specified and string cannot be encoded with encoding, a UnicodeEncodeError will be raised.
Currently clickhouse-driver can’t handle empty enum value due to Python’s Enum mechanics. Enum member name must be not empty. See issue and workaround.
ClickHouse/clickhouse-odbc
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
ODBC Driver for ClickHouse
This is the official ODBC driver implementation for accessing ClickHouse as a data source.
For more information on ClickHouse go to ClickHouse home page.
For more information on what ODBC is go to ODBC Overview.
The canonical repo for this driver is located at https://github.com/ClickHouse/clickhouse-odbc.
See LICENSE file for licensing information.
Table of contents
Pre-built binary packages of the release versions of the driver available for the most common platforms at:
Note, that since ODBC drivers are not used directly by a user, but rather accessed through applications, which in their turn access the driver through ODBC driver manager, user have to install the driver for the same architecture (32- or 64-bit) as the application that is going to access the driver. Moreover, both the driver and the application must be compiled for (and actually use during run-time) the same ODBC driver manager implementation (we call them «ODBC providers» here). There are three supported ODBC providers:
If you have Homebrew installed (usually applicable to macOS only, but can also be available in Linux), just execute:
If you don’t see a package that matches your platforms under Releases, or the version of your system is significantly different than those of the available packages, or maybe you want to try a bleeding edge version of the code that hasn’t been released yet, you can always build the driver manually from sources:
Native packages will have all the dependency information so when you install the driver using a native package, all required run-time packages will be installed automatically. If you use manual packaging, i.e., just extract driver binaries to some folder, you also have to make sure that all the run-time dependencies are satisfied in your system manually:
The first step usually consists of registering the driver so that the corresponding ODBC provider is able to locate it.
The next step is defining one or more DSNs, associated with the newly registered driver, and setting driver-specific parameters in the body of those DSN definitions.
All this involves modifying a dedicated registry keys in case of MDAC, or editing odbcinst.ini (for driver registration) and odbc.ini (for DSN definition) files for UnixODBC or iODBC, directly or indirectly.
This will be performed automatically using some default values if you are installing the driver using native installers.
Otherwise, if you are configuring manually, or need to modify the default configuration created by the installer, please see the exact locations of files (or registry keys) that need to be modified in the corresponding section below:
The list of DSN parameters recognized by the driver is as follows:
URL query string
Some of configuration parameters can be passed to the server as a part of the query string of the URL.
The list of parameters in the query string of the URL that are also recognized by the driver is as follows:
| Parameter | Default value | Description |
|---|---|---|
| database | default | Database name to connect to |
| default_format | ODBCDriver2 | Default wire format of the resulting data that the server will send to the driver. Formats supported by the driver are: ODBCDriver2 and RowBinaryWithNamesAndTypes |
Note, that currently there is a difference in timezone handling between ODBCDriver2 and RowBinaryWithNamesAndTypes formats: in ODBCDriver2 date and time values are presented to the ODBC application in server’s timezone, wherease in RowBinaryWithNamesAndTypes they are converted to local timezone. This behavior will be changed/parametrized in future. If server and ODBC application timezones are the same, date and time values handling will effectively be identical between these two formats.
Troubleshooting: driver manager tracing and driver logging
To debug issues with the driver, first things that need to be done are:
Building from sources
The general requirements for building the driver from sources are as follows:
Additional requirements exist for each platform, which also depend on whether packaging and/or testing is performed.
See the exact steps for each platform in the corresponding section below:
The list of configuration options recognized during the CMake generation step is as follows:
Run-time dependencies: Windows
All modern Windows systems come with preinstalled MDAC driver manager.
Run-time dependencies: macOS
Execute the following in the terminal (assuming you have Homebrew installed):
Execute the following in the terminal (assuming you have Homebrew installed):
Run-time dependencies: Red Hat/CentOS
Execute the following in the terminal:
Execute the following in the terminal:
Run-time dependencies: Debian/Ubuntu
Execute the following in the terminal:
Execute the following in the terminal:
Configuration: MDAC/WDAC (Microsoft/Windows Data Access Components)
To configure already installed drivers and DSNs, or create new DSNs, use Microsoft ODBC Data Source Administrator tool:
For full description of ODBC configuration mechanism in Windows, as well as for the case when you want to learn how to manually register a driver and have a full control on configuration in general, see:
Note, that the keys are subject to «Registry Redirection» mechanism, with caveats.
You can find sample configuration for this driver here (just map the keys to corresponding sections in registry):
In short, usually you will end up editing /etc/odbcinst.ini and /etc/odbc.ini for system-wide driver and DSN entries, and
/.odbc.ini for user-wide driver and DSN entries.
For more info, see:
You can find sample configuration for this driver here:
These samples can be added to the corresponding configuration files using the odbcinst tool (assuming the package is installed under /usr/local ):
In short, usually you will end up editing /etc/odbcinst.ini and /etc/odbc.ini for system-wide driver and DSN entries, and
/.odbc.ini for user-wide driver and DSN entries.
In macOS, if those INI files exist, they usually are symbolic or hard links to /Library/ODBC/odbcinst.ini and /Library/ODBC/odbc.ini for system-wide, and
/Library/ODBC/odbc.ini for user-wide configs respectively.
For more info, see:
You can find sample configuration for this driver here:
Enabling driver manager tracing: MDAC/WDAC (Microsoft/Windows Data Access Components)
Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:
Enabling driver manager tracing: UnixODBC
Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:
Enabling driver manager tracing: iODBC
Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:
Building from sources: Windows
CMake bundled with the recent versions of Visual Studio can be used.
An SDK required for building the ODBC driver is included in Windows SDK, which in its turn is also bundled with Visual Studio.
All of the following commands have to be issued in Visual Studio Command Prompt:
Clone the repo with submodules:
Enter the cloned source tree, create a temporary build folder, and generate the solution and project files in it:
Build the generated solution in-place:
. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):
Building from sources: macOS
You will need macOS 10.14 or later, Xcode 10 or later with Command Line Tools installed, as well as up-to-date Homebrew available in the system.
Install Homebrew using the following command, and follow the printed instructions on any additional steps required to complete the installation:
Then, install the latest Xcode from App Store. Open it at least once to accept the end-user license agreement and automatically install the required components.
Then, make sure that the latest Command Line Tools are installed and selected in the system:
Build-time dependencies: iODBC
Execute the following in the terminal:
Build-time dependencies: UnixODBC
Execute the following in the terminal:
Clone the repo recursively with submodules:
Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:
Build the generated solution in-place:
. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):
Building from sources: Red Hat/CentOS
Build-time dependencies: UnixODBC
Execute the following in the terminal:
Build-time dependencies: iODBC
Execute the following in the terminal:
All of the following commands have to be issued right after this one command issued in the same terminal session:
Clone the repo with submodules:
Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:
Build the generated solution in-place:
. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):
Building from sources: Debian/Ubuntu
Build-time dependencies: UnixODBC
Execute the following in the terminal:
Build-time dependencies: iODBC
Execute the following in the terminal:
Assuming, that the system cc and c++ are pointing to the compilers that satisfy the minimum requirements from Building from sources.
If the version of cmake is not recent enough, you can install a newer version by folowing instructions from one of these pages:
Clone the repo with submodules:
Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:
Build the generated solution in-place:
. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
QuickstartВ¶
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Selecting dataВ¶
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Selecting data with progress statisticsВ¶
Streaming resultsВ¶
When you are dealing with large datasets block by block results streaming may be useful:
Inserting dataВ¶
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
Of course for INSERT … SELECT queries data is not needed:
ClickHouse will execute this query like a usual SELECT query.
DDL queries can be executed in the same way SELECT queries are executed:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
This part of the documentation covers basic classes of the driver: Client, Connection and others.
ClientВ¶
Client for communication with the ClickHouse server. Single connection is established per each connected instance of the client.
| Parameters: | settings – Dictionary of settings that passed to every query. Defaults to None (no additional settings). See all available settings in ClickHouse docs. |
|---|
Disconnects from the server.
execute ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False, columnar=False ) В¶
Establishes new connection if it wasn’t established yet. After query execution connection remains intact for next queries. If connection can’t be reused it will be closed and new connection will be created.
execute_iter ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶
New in version 0.0.14.
execute_with_progress ( query, params=None, with_column_types=False, external_tables=None, query_id=None, settings=None, types_check=False ) В¶
ConnectionВ¶
Represents connection between client and ClickHouse server.
Closes connection between server and client. Frees resources: e.g. closes socket.
QueryResultВ¶
Stores query result from multiple blocks.
get_result ( ) В¶
| Returns: | Stored query result. |
|---|
ProgressQueryResultВ¶
Stores query result and progress information from multiple blocks. Provides iteration over query progress.
get_result ( ) В¶
| Returns: | Stored query result. |
|---|
IterQueryResultВ¶
Provides iteration over returned data by chunks (streaming by chunks).
clickhouse-driver 0.2.4
pip install clickhouse-driver==0.2.4 Copy PIP instructions
Released: Jun 13, 2022
Python driver with native interface for ClickHouse
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags ClickHouse, db, database, cloud, analytics
Requires: Python >=3.4, xzkostyan
Classifiers
Project description
ClickHouse Python Driver
ClickHouse Python Driver with native (TCP) interface support.
Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch
Features
Documentation
Usage
There are two ways to communicate with server:
Pure Client example:
License
ClickHouse Python Driver is distributed under the MIT license.
Project details
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Tags ClickHouse, db, database, cloud, analytics
Requires: Python >=3.4, xzkostyan
Infinidat/infi.clickhouse_orm
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
This project is simple ORM for working with the ClickHouse database. It allows you to define model classes whose instances can be written to the database and read from it.
Let’s jump right in with a simple example of monitoring CPU usage. First we need to define the model class, connect to the database and create a table for the model:
Now we can collect usage statistics per CPU, and write them to the database:
Querying the table is easy, using either the query builder or raw SQL:
This and other examples can be found in the examples folder.
To learn more please visit the documentation.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
QuickstartВ¶
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Selecting dataВ¶
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Selecting data with progress statisticsВ¶
Streaming resultsВ¶
When you are dealing with large datasets block by block results streaming may be useful:
Inserting dataВ¶
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
Of course for INSERT … SELECT queries data is not needed:
ClickHouse will execute this query like a usual SELECT query.
DDL queries can be executed in the same way SELECT queries are executed:
Async and multithreadingВ¶
Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.
To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.
The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.
However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
DB API 2.0В¶
This part of the documentation covers driver DB API.
clickhouse_driver.dbapi. connect ( dsn=None, host=None, user=’default’, password=», port=9000, database=’default’, **kwargs ) В¶
Create a new database connection.
The connection can be specified via DSN:
or using database and credentials arguments:
The basic connection parameters are:
See defaults in Connection constructor.
DSN or host is required.
Any other keyword parameter will be passed to the underlying Connection class.
| Returns: | a new connection. |
|---|
exception clickhouse_driver.dbapi. Warning В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. Error В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. DataError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. DatabaseError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. ProgrammingError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. IntegrityError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. InterfaceError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. InternalError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. NotSupportedError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. OperationalError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
ConnectionВ¶
Creates new Connection for accessing ClickHouse database.
Connection is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.
Close the connection now. The connection will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the connection. The same applies to all cursor objects trying to use the connection.
Do nothing since ClickHouse has no transactions.
cursor ( ) В¶
| Returns: | a new Cursor Object using the connection. |
|---|
rollback ( ) В¶
Do nothing since ClickHouse has no transactions.
CursorВ¶
Close the cursor now. The cursor will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the cursor.
Prepare and execute a database operation (query or command).
executemany ( operation, seq_of_parameters ) В¶
Fetch all (remaining) rows of a query result, returning them as a sequence of sequences (e.g. a list of tuples).
| Returns: | list of fetched rows. |
|---|
fetchmany ( size=None ) В¶
Fetch the next set of rows of a query result, returning a sequence of sequences (e.g. a list of tuples). An empty sequence is returned when no more rows are available.
| Parameters: | size – amount of rows to return. |
|---|---|
| Returns: | list of fetched rows or empty list. |
fetchone ( ) В¶
Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.
Adds external table to cursor context.
If the same table is specified more than once the last one is used.
set_query_id ( query_id ) В¶
Specifies the query identifier for cursor.
| Parameters: | query_id – the query identifier. |
|---|---|
| Returns: | None |
set_settings ( settings ) В¶
Specifies settings for cursor.
| Parameters: | settings – dictionary of query settings |
|---|---|
| Returns: | None |
set_stream_results ( stream_results, max_row_buffer ) В¶
Toggles results streaming from server. Driver will consume block-by-block of max_row_buffer size and yield row-by-row from each block.
set_types_check ( types_check ) В¶
Toggles type checking for sequence of INSERT parameters. Disabled by default.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
DB API 2.0В¶
This part of the documentation covers driver DB API.
clickhouse_driver.dbapi. connect ( dsn=None, host=None, user=’default’, password=», port=9000, database=’default’, **kwargs ) В¶
Create a new database connection.
The connection can be specified via DSN:
or using database and credentials arguments:
The basic connection parameters are:
See defaults in Connection constructor.
DSN or host is required.
Any other keyword parameter will be passed to the underlying Connection class.
| Returns: | a new connection. |
|---|
exception clickhouse_driver.dbapi. Warning В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. Error В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. DataError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. DatabaseError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. ProgrammingError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. IntegrityError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. InterfaceError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. InternalError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. NotSupportedError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
exception clickhouse_driver.dbapi. OperationalError В¶ with_traceback ( ) В¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
ConnectionВ¶
Creates new Connection for accessing ClickHouse database.
Connection is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.
Close the connection now. The connection will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the connection. The same applies to all cursor objects trying to use the connection.
Do nothing since ClickHouse has no transactions.
cursor ( ) В¶
| Returns: | a new Cursor Object using the connection. |
|---|
rollback ( ) В¶
Do nothing since ClickHouse has no transactions.
CursorВ¶
Close the cursor now. The cursor will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the cursor.
Prepare and execute a database operation (query or command).
executemany ( operation, seq_of_parameters ) В¶
Fetch all (remaining) rows of a query result, returning them as a sequence of sequences (e.g. a list of tuples).
| Returns: | list of fetched rows. |
|---|
fetchmany ( size=None ) В¶
Fetch the next set of rows of a query result, returning a sequence of sequences (e.g. a list of tuples). An empty sequence is returned when no more rows are available.
| Parameters: | size – amount of rows to return. |
|---|---|
| Returns: | list of fetched rows or empty list. |
fetchone ( ) В¶
Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.
Adds external table to cursor context.
If the same table is specified more than once the last one is used.
set_query_id ( query_id ) В¶
Specifies the query identifier for cursor.
| Parameters: | query_id – the query identifier. |
|---|---|
| Returns: | None |
set_settings ( settings ) В¶
Specifies settings for cursor.
| Parameters: | settings – dictionary of query settings |
|---|---|
| Returns: | None |
set_stream_results ( stream_results, max_row_buffer ) В¶
Toggles results streaming from server. Driver will consume block-by-block of max_row_buffer size and yield row-by-row from each block.
set_types_check ( types_check ) В¶
Toggles type checking for sequence of INSERT parameters. Disabled by default.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
DB API 2.0В¶
This part of the documentation covers driver DB API.
clickhouse_driver.dbapi. connect ( dsn=None, user=None, password=None, host=None, port=None, database=None, **kwargs ) В¶
Create a new database connection.
The connection can be specified via DSN:
or using database and credentials arguments:
The basic connection parameters are:
See defaults in Connection constructor.
DSN or host is required.
Any other keyword parameter will be passed to the underlying Connection class.
| Returns: | a new connection. |
|---|
exception clickhouse_driver.dbapi. Warning В¶ exception clickhouse_driver.dbapi. Error В¶ exception clickhouse_driver.dbapi. DataError В¶ exception clickhouse_driver.dbapi. DatabaseError В¶ exception clickhouse_driver.dbapi. ProgrammingError В¶ exception clickhouse_driver.dbapi. IntegrityError В¶ exception clickhouse_driver.dbapi. InterfaceError В¶ exception clickhouse_driver.dbapi. InternalError В¶ exception clickhouse_driver.dbapi. NotSupportedError В¶ exception clickhouse_driver.dbapi. OperationalError В¶
ConnectionВ¶
Creates new Connection for accessing ClickHouse database.
Connection is just wrapper for handling multiple cursors (clients) and do not initiate actual connections to the ClickHouse server.
Close the connection now. The connection will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the connection. The same applies to all cursor objects trying to use the connection.
Do nothing since ClickHouse has no transactions.
cursor ( ) В¶
| Returns: | a new Cursor Object using the connection. |
|---|
rollback ( ) В¶
Do nothing since ClickHouse has no transactions.
CursorВ¶
Close the cursor now. The cursor will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the cursor.
Prepare and execute a database operation (query or command).
executemany ( operation, seq_of_parameters ) В¶
Fetch all (remaining) rows of a query result, returning them as a sequence of sequences (e.g. a list of tuples).
| Returns: | list of fetched rows. |
|---|
fetchmany ( size=None ) В¶
Fetch the next set of rows of a query result, returning a sequence of sequences (e.g. a list of tuples). An empty sequence is returned when no more rows are available.
| Parameters: | size – amount of rows to return. |
|---|---|
| Returns: | list of fetched rows or empty list. |
fetchone ( ) В¶
Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.
Adds external table to cursor context.
If the same table is specified more than once the last one is used.
set_query_id ( query_id ) В¶
Specifies the query identifier for cursor.
| Parameters: | query_id – the query identifier. |
|---|---|
| Returns: | None |
set_settings ( settings ) В¶
Specifies settings for cursor.
| Parameters: | settings – dictionary of query settings |
|---|---|
| Returns: | None |
set_stream_results ( stream_results, max_row_buffer ) В¶
Toggles results streaming from server. Driver will consume block-by-block of max_row_buffer size and yield row-by-row from each block.
set_types_check ( types_check ) В¶
Toggles type checking for sequence of INSERT parameters. Disabled by default.
gavinln/clickhouse-test
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
This project provides a Ubuntu (20.04) Vagrant Virtual Machine (VM) with Clickhouse. Clickhouse is is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP).
There are Ansible scripts that automatically install the software when the VM is started.
Setup the machine
All the software installed exceeds the standard 10GB size of the virtual machine disk. Install the following plugin to resize the disk.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
QuickstartВ¶
This page gives a good introduction to clickhouse-driver. It assumes you already have clickhouse-driver installed. If you do not, head over to the Installation section.
A minimal working example looks like this:
This code will show all tables from ‘default’ database.
There are two conceptual types of queries:
Selecting dataВ¶
Simple select query looks like:
Of course queries can and should be parameterized to avoid SQL injections:
Percent symbols in inlined constants should be doubled if you mix constants with % symbol and %(x)s parameters.
Customisation SELECT output with FORMAT clause is not supported.
Selecting data with progress statisticsВ¶
Streaming resultsВ¶
When you are dealing with large datasets block by block results streaming may be useful:
Inserting dataВ¶
Insert queries in Native protocol are a little bit tricky because of ClickHouse’s columnar nature. And because we’re using Python.
INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks. Each block is sent in binary columnar form.
As data in each block is sent in binary we should not serialize into string by using substitution %(a)s and then deserialize it back into Python types.
This INSERT would be extremely slow if executed with thousands rows of data:
To insert data efficiently, provide data separately, and end your statement with a VALUES clause:
You can use any iterable yielding lists, tuples or dicts.
If data is not passed, connection will be terminated after a timeout.
The following WILL NOT work:
ClickHouse will execute this query like a usual SELECT query.
Inserting data in different formats with FORMAT clause is not supported.
See Inserting data from CSV file if you need to data in custom format.
DDL queries can be executed in the same way SELECT queries are executed:
Async and multithreadingВ¶
Every ClickHouse query is assigned an identifier to enable request execution tracking. However, ClickHouse native protocol is synchronous: all incoming queries are executed consecutively. Clickhouse-driver does not yet implement a connection pool.
To utilize ClickHouse’s asynchronous capability you should either use multiple Client instances or implement a queue.
The same thing is applied to multithreading. Queries from different threads can’t use one Client instance with single connection. You should use different clients for different threads.
However, if you are using DB API for communication with the server each cursor create its own Client instance. This makes communication thread-safe.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
Extremely slow on large select, http protocol almost 10 times faster #32
Comments
dmitriyshashkin commented Mar 20, 2018
It seems that selecting large datasets using the native client is extremely slow. Here is my benchmark https://gist.github.com/dmitriyshashkin/6a4849bdcf882ba340cdfbc1990da401
Initially, I’ve encountered this behavior on my own dataset, but I was able to reproduce it using the dataset and the structure described here https://clickhouse.yandex/docs/en/getting_started/example_datasets/ontime/
To simplify things a little bit I’ve used the data for just one month: http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_2017_12.zip
As you can see the fastest way to get the data is by using HTTP protocol with requests and pandas. The problem gets worse as the number of the rows grows, on my own dataset with 5M rows I waited for 1 hour before I had to interrupt the process. The bottleneck is not CH itself, the «top» command shows that all the work is done by python with 100% CPU utilization, while CH is almost idle.
The text was updated successfully, but these errors were encountered:
xzkostyan commented Mar 21, 2018
I haven’t tried to play with provided data yet. But here is the explanation of speed loss.
HTTP client returns plain text (csv) that should be parsed for example with pandas. Native client returns Python types.
Pandas is complied library (correct me if I’m not right), this driver written in pure Python (except compression and hashing libraries).
I’ll try to cythonize some bottlenecks in source code. The main bottleneck is results trasposition from columnar to row-like form.
If your data processing is OK with columnar form your can specify columnar=True parameter in execute call. This will give significant speedup.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
DevelopmentВ¶
Test configurationВ¶
In setup.cfg you can find ClickHouse server port, credentials, logging level and another options than can be tuned during local testing.
Running tests locallyВ¶
Install desired Python version with system package manager/pyenv/another manager.
Install test requirements and build package:
You should install cython if you want to change *.pyx files:
ClickHouse on host machineВ¶
Install desired versions of clickhouse-server and clickhouse-client on your machine.
ClickHouse in dockerВ¶
Create container desired version of clickhouse-server :
Create container with the same version of clickhouse-client :
Create clickhouse-client script on your host machine:
After it container test-clickhouse-client will communicate with test-clickhouse-server transparently from host machine.
Add entry in hosts file:
Set TZ=UTC and run tests:
GitHub Actions in forked repositoryВ¶
Workflows in forked repositories can be used for running tests.
Workflows don’t run in forked repositories by default. You must enable GitHub Actions in the Actions tab of the forked repository.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
InstallationВ¶
Python VersionВ¶
Clickhouse-driver supports Python 3.4 and newer and PyPy.
Build DependenciesВ¶
Example for python:alpine docker image:
By default there are wheels for Linux, Mac OS X and Windows.
Packages for Linux and Mac OS X are available for python: 3.4 – 3.9.
Packages for Windows are available for python: 3.5 – 3.9.
DependenciesВ¶
These distributions will be installed automatically when installing clickhouse-driver.
Optional dependenciesВ¶
These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.
Installation from PyPIВ¶
The package can be installed using pip :
You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:
You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:
NumPy supportВ¶
You can install additional packages (NumPy and Pandas) if you need NumPy support:
NumPy supported versions are limited by numpy package python support.
Installation from githubВ¶
Development version can be installed directly from github:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
PerformanceВ¶
This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.
clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.
When you read data over HTTP you may need to cast strings into Python types.
Test dataВ¶
Sample data for testing is taken from ClickHouse docs.
Create database and table:
Download some data for 2017 year:
Insert data into ClickHouse:
Required packagesВ¶
For fast json parsing we’ll use ujson package:
VersionsВ¶
Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]
BenchmarkingВ¶
Scripts below can be benchmarked with following one-liner:
Time will measure:
Plain text without parsingВ¶
Let’s take get plain text response from ClickHouse server as baseline.
Fetching not parsed data with pure requests (1)
Parsed rowsВ¶
Line split into elements will be consider as “parsed” for TSV format (2)
Now we cast each element to it’s data type (2.5)
JSONEachRow format can be loaded with json loads (3)
Get fully parsed rows with clickhouse-driver in Native format (4)
Iteration over rowsВ¶
Iteration over TSV (5)
Now we cast each element to it’s data type (5.5)
Iteration over JSONEachRow (6)
Iteration over rows with clickhouse-driver in Native format (7)
Iteration over string rowsВ¶
OK, but what if we need only string columns?
Iteration over TSV (8)
Iteration over JSONEachRow (9)
Iteration over string rows with clickhouse-driver in Native format (10)
Iteration over int rowsВ¶
Iteration over TSV (11)
Iteration over JSONEachRow (12)
Iteration over int rows with clickhouse-driver in Native format (13)
ResultsВ¶
This table contains memory and timing benchmark results of snippets above.
JSON in table is shorthand for JSONEachRow.
| Rows | |||||
|---|---|---|---|---|---|
| 50k | 131k | 217k | 450k | 697k | |
| Plain text without parsing: timing | |||||
| Naive requests.get TSV (1) | 0.40 s | 0.67 s | 0.95 s | 1.67 s | 2.52 s |
| Naive requests.get JSON (1) | 0.61 s | 1.23 s | 2.09 s | 3.52 s | 5.20 s |
| Plain text without parsing: memory | |||||
| Naive requests.get TSV (1) | 49 MB | 107 MB | 165 MB | 322 MB | 488 MB |
| Naive requests.get JSON (1) | 206 MB | 564 MB | 916 MB | 1.83 GB | 2.83 GB |
| Parsed rows: timing | |||||
| requests.get TSV (2) | 0.81 s | 1.81 s | 3.09 s | 7.22 s | 11.87 s |
| requests.get TSV with cast (2.5) | 1.78 s | 4.58 s | 7.42 s | 16.12 s | 25.52 s |
| requests.get JSON (3) | 2.14 s | 5.65 s | 9.20 s | 20.43 s | 31.72 s |
| clickhouse-driver Native (4) | 0.73 s | 1.40 s | 2.08 s | 4.03 s | 6.20 s |
| Parsed rows: memory | |||||
| requests.get TSV (2) | 171 MB | 462 MB | 753 MB | 1.51 GB | 2.33 GB |
| requests.get TSV with cast (2.5) | 135 MB | 356 MB | 576 MB | 1.15 GB | 1.78 GB |
| requests.get JSON (3) | 139 MB | 366 MB | 591 MB | 1.18 GB | 1.82 GB |
| clickhouse-driver Native (4) | 135 MB | 337 MB | 535 MB | 1.05 GB | 1.62 GB |
| Iteration over rows: timing | |||||
| requests.get TSV (5) | 0.49 s | 0.99 s | 1.34 s | 2.58 s | 4.00 s |
| requests.get TSV with cast (5.5) | 1.38 s | 3.38 s | 5.40 s | 10.89 s | 16.59 s |
| requests.get JSON (6) | 1.89 s | 4.73 s | 7.63 s | 15.63 s | 24.60 s |
| clickhouse-driver Native (7) | 0.62 s | 1.28 s | 1.93 s | 3.68 s | 5.54 s |
| Iteration over rows: memory | |||||
| requests.get TSV (5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get TSV with cast (5.5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (6) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (7) | 56 MB | 70 MB | 71 MB | 71 MB | 71 MB |
| Iteration over string rows: timing | |||||
| requests.get TSV (8) | 0.40 s | 0.67 s | 0.80 s | 1.55 s | 2.18 s |
| requests.get JSON (9) | 1.14 s | 2.64 s | 4.22 s | 8.48 s | 12.96 s |
| clickhouse-driver Native (10) | 0.46 s | 0.91 s | 1.35 s | 2.49 s | 3.67 s |
| Iteration over string rows: memory | |||||
| requests.get TSV (8) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (9) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (10) | 46 MB | 56 MB | 57 MB | 57 MB | 57 MB |
| Iteration over int rows: timing | |||||
| requests.get TSV (11) | 0.84 s | 2.06 s | 3.22 s | 6.27 s | 10.06 s |
| requests.get JSON (12) | 0.95 s | 2.15 s | 3.55 s | 6.93 s | 10.82 s |
| clickhouse-driver Native (13) | 0.43 s | 0.61 s | 0.86 s | 1.53 s | 2.27 s |
| Iteration over int rows: memory | |||||
| requests.get TSV (11) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (12) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (13) | 41 MB | 48 MB | 48 MB | 48 MB | 49 MB |
ConclusionВ¶
If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.
It doesn’t matter which interface to use if you manipulate small amount of rows.
clickhouse-driver
Python driver with native interface for ClickHouse
Package Health Score
Keep your project healthy
Check your requirements.txt
Snyk Vulnerability Scanner
Secure Your Project
Popularity
Total Weekly Downloads (293,670)
Direct Usage Popularity
The PyPI package clickhouse-driver receives a total of 293,670 downloads a week. As such, we scored clickhouse-driver popularity level to be Influential project.
Based on project statistics from the GitHub repository for the PyPI package clickhouse-driver, we found that it has been starred 900 times, and that 0 other projects in the ecosystem are dependent on it.
The download numbers shown are the average weekly downloads from the last 6 weeks.
Security
Security and license risk for latest version
We found a way for you to contribute to the project! Looks like clickhouse-driver is missing a security policy.
You can connect your project’s repository to Snyk to stay up to date on security alerts and receive automatic fix pull requests.
Maintenance
Commit Frequency
Further analysis of the maintenance status of clickhouse-driver based on released PyPI versions cadence, the repository activity, and other data points determined that its maintenance is Healthy.
We found that clickhouse-driver demonstrates a positive version release cadence with at least one new version released in the past 3 months.
As a healthy sign for on-going project maintenance, we found that the GitHub repository had at least 1 pull request or issue interacted with by the community.
Community
With more than 10 contributors for the clickhouse-driver repository, this is possibly a sign for a growing and inviting community.
We found a way for you to contribute to the project! Looks like clickhouse-driver is missing a Code of Conduct.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
InstallationВ¶
Python VersionВ¶
Clickhouse-driver supports Python 3.4 and newer, Python 2.7, and PyPy.
Build DependenciesВ¶
Example for python:alpine docker image:
By default there are wheels for Linux, Mac OS X and Windows.
Packages for Linux and Mac OS X are available for python: 2.7, 3.4 – 3.9.
Packages for Windows are available for python: 2.7, 3.5 – 3.9.
DependenciesВ¶
These distributions will be installed automatically when installing clickhouse-driver.
Optional dependenciesВ¶
These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.
Installation from PyPIВ¶
The package can be installed using pip :
You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:
You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:
NumPy supportВ¶
You can install additional packages (NumPy and Pandas) if you need NumPy support:
NumPy supported versions are limited by numpy package python support.
Installation from githubВ¶
Development version can be installed directly from github:
ClickHouse/dbt-clickhouse
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
This plugin ports dbt functionality to Clickhouse.
We do not test over older versions of Clickhouse. The plugin uses syntax that requires version 22.1 or newer.
Use your favorite Python package manager to install the app from PyPI, e.g.
| Option | Description | Required? |
|---|---|---|
| engine | The table engine (type of table) to use when creating tables | Optional (default: MergeTree() ) |
| order_by | A tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster. | Optional (default: tuple() ) |
| partition_by | A partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns. | Optional |
| unique_key | A tuple of column names that uniquely identify rows. For more details on uniqueness constraints, see here. | Optional |
| inserts_only | This property is relevant only for incremental materialization. If set to True, incremental updates will be inserted directly to the target table without creating intermediate table. This option has the potential of significantly improve performance and avoid memory limitations on big updates. | Optional |
| settings | A dictionary with custom settings for INSERT INTO and CREATE AS SELECT queries. | Optional |
Note: The only feature that is not supported and not tested is Ephemeral materialization.
Tests running command: pytest tests/integration
You can customize a few test params through environment variables. In order to provide custom params you’ll need to create test.env file under root (remember not to commit this file!) and define the following env variables inside:
ClickHouse wants to thank @silentsokolov for creating this connector and for their valuable contributions.
About
The Clickhouse plugin for dbt (data build tool)
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
ClickHouse Python Driver with native interface support
Related tags
Overview
ClickHouse Python Driver
ClickHouse Python Driver with native (TCP) interface support.
Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch
There are two ways to communicate with server:
Pure Client example:
ClickHouse Python Driver is distributed under the MIT license.
Issues
Fix null value on bytestring columns
When client setting strings_as_bytes is set, the driver crashes when inserting None values into columns with type Nullable([Fixed]String) :
fallback for os_name if user name is not defined
got this error while running inside docker container (no user entry for such uid)
Add max_partitions_per_insert_block to settings.available
The max_partitions_per_insert_block is defined in: https://github.com/yandex/ClickHouse/blob/f566182582c70986be19777b3583c803607928ad/dbms/src/Core/Settings.h#L315
How to speed up inserts from pandas dataframe?
I have pandas dataframe on my laptop with few millions of records. I am inserting them to clickhouse table with: client.execute(‘insert into database.table (col1, col2…, coln) values’, df.values.tolist())
After execution of this command I looked at laptop’s network activity.
As you can see network activity is in peaks up to 12 Mbps, with lows at 6 Mbps. Such activity takes quite a long, and than at one moment, laptop’s network send goes to the 100 Mbps for some short period of time and insert is over.
Can someone explain how insert in clickhouse driver works? Why they are not going to the clickhouse server at top network speed?
I tried to play with settings like max_insert_block_size or insert_block_size, but with no success. A there any clickhouse server parameters that could improve the speed of inserts?
What would be the fastest way to insert pandas dataframe to clickhouse table?
Enum option parsing is not handling all supported characters correctly
When querying a table with Enum options containing comma and a space, the parsing of the options fails (see below).
With an example table as
the options are a bit non-standard but seem to be actually permitted. (Kind of created these options by accident due to a typo in a query and then figured out that the parsing could be improved on this.)
And while testing the the original parsing, I’ve also noticed that it is not really handling any empty characters before the first option; it gets prepended into the first option, e.g., Enum8( ‘one’ = 1, ‘exa»mple’ = 2, ‘three’ = 3) is turned into <" 'one": 1, 'exa"mple': 2, 'three': 3>which doesn’t seem right either.
Do you agree that it make sense to fix the options parsing? Should I add some tests for it?
In addition I’ve added escaping of single quotes into generated error message.
Get progress info
It seems the Progress packets are received and managed but there is no way to get the info from the Client or Connection objects. Here an API proposition with a fetch* method, this is common on a database API.
Last query’s profile info
Hi, is there an easy way to read
I have tried reading query profile info but I could only get
I am missing 3 more measurements
Question:
BTW1: I can process the system.query_log but this is a cumbersome approach BTW2: This could be a nice feature to have, i.e. adding an option to display profile info from Client.execute()
Wrong DateTime insert
After insert datetime.datetime(2018, 1, 19, 10) through this driver I see ‘2018-01-19 13:00:00’ value in table. Timezone on my computer and clickhouse server is Moskow.
What I must do to see ‘2018-01-19 10:00:00’ after insert?
Feature request: Extend columnar form to support NumPy / PyArrow arrays
As far as I understand there are two ways to do this, either turn python tuples into numpy arrays, if possible with zero copy, or do the transformation straight ahead on the binary data.
The bonus of this will be that another zero copy transformation of numpy arrays to pyarrow arrays can be easily done. This way we gain easily two significant advantages:
Use pyarrow for batch processing, table and pandas dataframe zero copy transformations, it’s blazing fast and memory efficient.
it opens the sky for arrow flight protocol (gRPC based) which can be great for transferring data with high speed from remote servers.
I would also like to use columnar forms with numpy arrays for my project and I am offering testing.
expected Hello or Exception, got Unknown packet
Describe the bug Client throws this error when running queries.
To Reproduce
Versions Python 3.9.6 clickhouse-driver built from commit 78e389e36d20744c236c546ee01ee76d5bc5fb35 Clickhouse server version 21.10.1 revision 54449
how to connect to remote clickhouse server
actually i think its a really useful tool, but the documentations are sooo poor. the introduction provides both client and connector examples, however all of these are toy examples. just as below:
client = Client(‘localhost’) conn = connect(‘clickhouse://localhost’)
how to build connection in real production env is not mentioned, how about remote situations? how to config clickhouse-server? which parameters are needed for client and connector API? None of these clearly provided. so i think this project is built from your daily work. but the project has got 0.5k stars, i got confused.
a better choice is https://github.com/ClickHouse/clickhouse-go
Boolen data type upload problem
Describe the bug Hi, I upload some data to clickhouse by clickhouse-driver. And my data types include «Boolen», the python script is run successfully. But the data in my database is not correct.
The error as below:
To Reproduce Minimal piece of Python code that reproduces the problem. CREATE TABLE IF NOT EXISTS paper ( has_inbound_citations Nullable(Bool), has_outbound_citations Nullable(Bool), ) engine = Memory
INSERT INTO paper has_inbound_citations,has_outbound_citations VALUES
Expected behavior A clear and concise description of what you expected to happen. It seems to be ‘true’ or ‘false’ in the database but given errors text.
Versions
python 3.10 clickhouse-driver 0.2.4 SELECT version()
Insert dataframe writes max datetime (2106-02-07) when its None in df
Describe the bug When I insert pandas dataframe with null/None/np.nan datetime columns, i got max datetime value in clickhouse (2106-02-07), although i need 1970-01-01.
To Reproduce
Expected behavior It should return 1970-01-01 instead of 2106-02-07
Versions
clickhouse-driver 0.2.4 clickhouse 22.3.9.19 python 3.9.7
Failed enum insert when types_check enabled
Describe the bug In the documentation it is written that supported types for enum inserts are: Enum, int, long, str/basestring. https://clickhouse-driver.readthedocs.io/en/latest/types.html#enum8-16 Executing the example from the documentation works fine, except when check_types is enabled, than it fails with the following msg:
To Reproduce
Expected behavior insert without error
Versions
query_dataframe returns empty dataframe with shape (0, 0) instead of returning shape (0, number of columns)
Describe the bug When the query returns 0 rows, function returns empty dataframe with the shape (0,0) without specifying any columns from query.
IMHO, in that case it should return the dataframe with shape (0, # of columns)
To Reproduce
Expected behavior
insert_dataframe fails with keyerror for nullable columns
Describe the bug If the clickhouse table has some nullable columns, then if we don’t add those columns in the data frame and try to upload it using clickhouse driver client.insert_dataframe, it fails with the keyerror.
To Reproduce
Expected behavior API should write to the clickhouse table by leaving nullable column with NULL values. Versions
Stacktrace: ` 2022-06-15T09:43:29.384183066Z stderr F raise KeyError(key) from err
How to speed up inserts from pandas dataframe? #76
Comments
ghuname commented Feb 21, 2019
I have pandas dataframe on my laptop with few millions of records. I am inserting them to clickhouse table with:
client.execute(‘insert into database.table (col1, col2…, coln) values’, df.values.tolist())
After execution of this command I looked at laptop’s network activity.
As you can see network activity is in peaks up to 12 Mbps, with lows at 6 Mbps.
Such activity takes quite a long, and than at one moment, laptop’s network send goes to the 100 Mbps for some short period of time and insert is over.
Can someone explain how insert in clickhouse driver works?
Why they are not going to the clickhouse server at top network speed?
I tried to play with settings like max_insert_block_size or insert_block_size, but with no success.
A there any clickhouse server parameters that could improve the speed of inserts?
What would be the fastest way to insert pandas dataframe to clickhouse table?
The text was updated successfully, but these errors were encountered:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driver¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API Reference¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional Notes¶
Legal information, changelog and contributing are here for the interested.
clickhouse-http-client 1.0.2
pip install clickhouse-http-client Copy PIP instructions
Released: Jun 30, 2021
clickhouse http client, author liyuanjun
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT
Author: liyuanjun
Maintainers
Project description
clickhouse-http-client
clickhouse http client.
Install
Usage
Project details
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT
Author: liyuanjun
Maintainers
Download files
Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
ppodolsky/clickhouse-python
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Models are defined in a way reminiscent of Django’s ORM:
The main object you are interacting with is Database:
Topology is just a special object wrapping hosts that also introduces priorities of hosts. How to prepare a topology you can read in the next section. If neccessary, you can specify credentials:
ClickHouse is optimized for a bulk insert, and we’ve implemented embedded buffering here to avoid single inserts. Every model (table) has its own buffer and buffer size defines how many instances of the model must be collected in buffer before real insert. If you need more predictable inserts, you can always use db.flush() which sends all collected instances immediately or even set buffer_size=0 to flush every insert. Buffering are disabled by default, for using it you must set an appropriate buffer_size:
The rule of thumbs to choose buffer size is to set such a size that buffer would overflow every second. Database client can be thread-safe. To get thread-safety use threaded=True while creating Database object. You can create a separate thread to flush every second or insert in multiple threads.
Describing topology of ClickHouse cluster
This wrapper tends to support multi DC strategies. Topology can be described in the following format:
where keys in the dictionary are priorities of corresponding host’s lists, lesser values means higher priority. In the topology above requests will be always sent to any of host1, host2, host3 (choosen randomly every time). Hosts with priority 2 will be involved in action only if all hosts with priority 1 fall down.
Assuming there are two data centers DC-1 and DC-2 and code is running on a host in DC-1
There is a helper to produce topology in the required format from a more human readable format. Code below produces the same result as above:
ClickHouse and Python: Jupyter Notebooks
Jupyter Notebooks are an indispensable tool for sharing code between users in Python data science. For those unfamiliar with them, notebooks are documents that contain runnable code snippets mixed with documentation. They can invoke Python libraries for numerical processing, machine learning, and visualization. The code output includes not just text output but also graphs from powerful libraries like matplotlib and seaborn. Notebooks are so ubiquitous that it’s hard to think of manipulating data in Python without them.
ClickHouse support for Jupyter Notebooks is excellent. I have spent the last several weeks playing around with Jupyter Notebooks using two community drivers: clickhouse-driver and clickhouse-sqlalchemy. The results are now published on Github at https://github.com/Altinity/clickhouse-python-examples. The remainder of this blog contains tips to help you integrate ClickHouse data to your notebooks.
Driver Installation
You can run Jupyter Notebooks directly from the command line but like most people I run them using Anaconda. We’ll assume you know how to run Jupyter from Anaconda Navigator. (If not, read the Anaconda docs and come back.) To use the ClickHouse drivers you’ll want to run conda commands similar to the following to bring them into your environment. This example uses the ‘base’ environment.
Now when you start Jupyter with the ‘base’ environment you’ll have ClickHouse drivers available for import. Tip: you can run these commands to load modules while Jupyter is already running. I do this regularly to top up missing libraries.
There are other Python drivers available such as the sqlalchemy-clickhouse driver developed by Marek Vavrusa and others. However, the drivers shown above are available on conda-forge which makes them easy to use with Anaconda.
So much for installation. Let’s put the drivers to use.
Shortest Path to Data
The easiest way to work on data from ClickHouse is via the SQLAlchemy %sql magic function. There is a sample notebook that shows how to do this easily. For now let’s step through the recipe since this likely to be the most common way many users access data from ClickHouse.
First, let’s load SQLAlchemy and enable the %sql function.
Next, let’s connect to ClickHouse and fetch data from the famous Iris data set into a pandas data frame. The last command shows the end of the frame so we can confirm it has data.
Finally, let’s create a nice scatter graph with some of the data. This code is the most complex by far but generates a nice picture showing the overlap between characteristics of the three Iris species.
The result is the very satisfactory graph shown below.
For more details and to run the sample yourself check out the source notebook file.
Translating Data Types
One of the issues you’ll need to watch for in your own work is ensuring that pandas data frames have correct data types, especially numbers. If your SQL schema sticks with ints and floats, values will convert easily in result sets. More specialized types like Decimal do not automatically convert to numeric types, which means that libraries like matplotlib and scikit-learn won’t be able to use them correctly. Here’s an example of properly conforming DDL for the iris table:
It’s a good idea to run DataFrame.describe() on data frames created from SQL to ensure you got it right and that values have the expected types.
The key thing to check for is that numeric columns are really numbers and not ‘object’ or ‘str’ values. You’ll of course notice problems with as soon as you try to put values in a graph or feed them to numerical libraries. For example, Matplotlib does not correctly plot X and Y axes for non-numeric data. That said, the root cause can be confusing to diagnose if you have not see it before.
Pandas has methods that allow you to patch up mismatched types but it’s easier to get things right in the schema to begin with.
Direct Use of ClickHouse Drivers
The %sql function is great if you are just accessing data and need to get it into a data frame. But what if you want to do more than just look at query results? %sql cannot run DDL or insert values. In this case you can import clickhouse-driver and clickhouse-alchemy entities and call them directly from notebook code. Here’s a trivial example:
We documented use of the clickhouse-driver in depth in a previous Altinity blog article. You can look there for a general overview of the driver. The EX-1.0-Getting-to-Know-the-Clickhouse-driver-Client.ipynb notebook contains samples showing how to run DDL, select data, and load CSV.
Use of the clickhouse-sqlalchemy driver is illustrated in the EX-2-ClickHouse-SQL-Alchemy.ipynb notebook. We have not done a full review on the driver but based on initial experience it seems to work as well as the clickhouse-driver module, on which it depends. The main committer is Konstantin Lebedev (@xzkostyan), who also developed clickhouse-driver. You can also look at the documentation in the Github project. Between the notebook samples and the project README users who have previously used SQLAlchemy should have little problem undertstanding it.
Relatively few problems popped up during notebook development. I have not run into driver operations that work elsewhere but fail in Jupyter. Driver behavior in Jupyter appears 100% equivalent to running Python3 from the command line. We expect this of course but it’s still good when it happens. The most interesting problems so far were related to data conversions, which are a typical integration issue.
Lessons from Jupyter and ClickHouse
There is a natural symbiosis between ClickHouse and Python libraries like pandas and scikit-learn. Notebooks are very helpful for exploring the relationship in a systematic way.
Over the last few weeks I have noticed ways to combine capabilities from both sides effectively. Here are two simple examples that popped up relating to pandas data frames.
Going from SQL to Pandas. Data frames can manipulate data in ways that are difficult to do in ClickHouse. For example, you select normalized array data from ClickHouse data frame, then use the DataFrame.pivot_table() method to pivot rows and columns. See the EX-4-Pivot-Using-SQL-And-Pandas.ipynb for an example of how to do this.
Going from Pandas to SQL. I documented CSV loading in the clickhouse-driver using the csv.DictReader in my last blog article. It turns out that Pandas has a much better CSV reader than the native Python csv module. Among other things it converts numeric types automatically. This is now part of the clickhouse-driver notebook.
I’m sure there are many other ways to use Jupyter Notebook creatively with ClickHouse. If you have additional samples or see problems with those already there, please submit a PR on Github. Having a centrally located library of nice Python samples for ClickHouse will help all users.
whisklabs/airflow-clickhouse-plugin
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Airflow ClickHouse Plugin
Installation and dependencies
Requires apache-airflow and clickhouse-driver (installed automatically by pip ). Primarily supports Airflow 2.0–2.3. Later versions are expected to work properly but may be not fully tested. Use plugin versions below 0.6.0 (e.g. 0.5.7.post1) to preserve compatibility with Airflow 1.10.6 (this version has long-term support on Google Cloud Composer).
Note on pandas dependency
To import ClickHouseOperator use: from airflow_clickhouse_plugin.operators.clickhouse_operator import ClickHouseOperator
The result of the last query is pushed to XCom.
To import ClickHouseHook use: from airflow_clickhouse_plugin.hooks.clickhouse_hook import ClickHouseHook
Supported kwargs of constructor ( __init__ method):
Supports all the methods of the Airflow BaseHook including:
Sensor fully inherits from Airflow SQLSensor and therefore fully implements its interface using ClickHouseHook to fetch the SQL execution result and supports templating of sql argument.
How to create an Airflow connection to ClickHouse
As a type of a new connection, choose SQLite. host should be set to ClickHouse host’s IP or domain name.
There is no special ClickHouse connection type yet, so we use SQLite as the closest one.
If you use a secure connection to ClickHouse (this requires additional configurations on ClickHouse side), set extra to <"secure":true>.
ClickHouse Connection schema
clickhouse_driver.Client is initialized with attributes stored in Airflow Connection attributes. The mapping of the attributes is listed below:
| Airflow Connection attribute | Client.__init__ argument |
|---|---|
| host | host |
| port | port |
| schema | database |
| login | user |
| password | password |
| extra | **kwargs |
database argument of ClickHouseOperator or ClickHouseHook overrides schema attribute of the Airflow connection.
For example, if Airflow connection contains extra= <"secure":true>then the Client.__init__ will receive secure=True keyword argument in addition to other non-empty connection attributes.
If the Airflow connection attribute is not set then it is not passed to the Client at all. In that case the default value of the corresponding clickhouse_driver.Connection argument is used (e.g. user defaults to ‘default’ ).
This means that Airflow ClickHouse Plugin does not itself define any default values for the ClickHouse connection. You may fully rely on default values of the clickhouse-driver version you use. The only exception is host : if the attribute of Airflow connection is not set then ‘localhost’ is used.
Important note: don’t try to insert values using ch_hook.run(‘INSERT INTO some_ch_table VALUES (1)’) literal form. clickhouse-driver requires values for INSERT query to be provided via parameters due to specifics of the native ClickHouse protocol.
How to run tests
Integration tests require access to ClickHouse server. Tests use connection URI defined via environment variable AIRFLOW_CONN_CLICKHOUSE_DEFAULT with clickhouse://localhost as default.
Github Action is set up for this project.
Run tests using Docker
Run ClickHouse server inside Docker:
The above command will open bash inside the container.
Install dependencies into container and run tests (execute inside container):
About
Airflow ClickHouse Plugin based on clickhouse-driver
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
How to connect to ClickHouse with Python using SQLAlchemy
Introduction
ClickHouse is one of the fastest opensource databases in the market and it claims to be faster than Spark. At WhiteBox we’ve tested this hypothesis with a +2 billion rows table and we can assure you it is! Our tests performed 3x faster for a complex aggregation with several filters.
Regarding this tutorial, all code and steps in this post has been tested in May 2021 and Ubuntu 20.04 OS, so please don’t be evil and don’t complain if the code does not work in September 2025 😅.
Requirements
The requirements for this integration are the following:
ClickHouse server: It can be installed quite easily following the official documentation. Current version (21.4.5.46).
Setup
ClickHouse installation
This tutorial can be tested against any ClickHouse database. However, in order to get a local ClickHouse database to test the integration, it can be easily installed following the steps below:
Running command “clickhouse-client” on the shell ensure you that your ClickHouse installation is properly working. Besides, it can help you debug the SQLAlchemy DDL.
Python environment
These are the Python libraries that are required to run the all the code in this tutorial:
Integration
SQLAlchemy setup
The following lines of code perform the SQLAlchemy standard connection:
Create a new database
It is possible to test the current databases in ClickHouse from the command line connection using the command “SHOW DATABASES”. The following output should display on screen:
Create a new table
The following steps show how to create a MergeTree engine table in ClickHouse using the SQLAlchemy ORM model.
ORM model definition
A new table should appear in the new database:
INSERT
SELECT
Conclusions
Should ClickHouse replace traditional databases like Postgres, MySQL, Oracle? Definitively no. These databases have a lot of features that ClickHouse doesn’t currently have nor it is intended to have in the future (primary key basic concepts, unique columns…). It can be considered an analytics database but not a fully functioning transactional one.
However, ClickHouse speed is so amazing that it should be definitively the GOTO when there is a huge amount of tabular data.
mymarilyn/aioch
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
aioch is a library for accessing a ClickHouse database over native interface from the asyncio. It wraps features of clickhouse-driver for asynchronous usage.
The package can be installed using pip :
To install from source:
For more information see clickhouse-driver usage examples.
Other parameters are passing to wrapped clickhouse-driver’s Client.
aioch is distributed under the MIT license.
About
romario076/ClickHouseConnector
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Create clickhouse_driver.pandasConnector module, it is ClickHouse connector for python using pandas.
Using your query, module returns pandas DataFrame.
About
ClickHouse connector for python using pandas
Topics
Resources
Stars
Watchers
Forks
Releases
Packages 0
Languages
Footer
© 2022 GitHub, Inc.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
dbt-clickhouse 1.1.7
pip install dbt-clickhouse Copy PIP instructions
Released: Jul 11, 2022
The Clickhouse plugin for dbt (data build tool)
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: Apache Software License (MIT)
Requires: Python >=3.7
Maintainers
Classifiers
Project description
dbt-clickhouse
This plugin ports dbt functionality to Clickhouse.
We do not test over older versions of Clickhouse. The plugin uses syntax that requires version 22.1 or newer.
Installation
Use your favorite Python package manager to install the app from PyPI, e.g.
Supported features
Usage Notes
Database
Model Configuration
| Option | Description | Required? |
|---|---|---|
| engine | The table engine (type of table) to use when creating tables | Optional (default: MergeTree() ) |
| order_by | A tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster. | Optional (default: tuple() ) |
| partition_by | A partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns. | Optional |
| unique_key | A tuple of column names that uniquely identify rows. For more details on uniqueness constraints, see here. | Optional |
| inserts_only | This property is relevant only for incremental materialization. If set to True, incremental updates will be inserted directly to the target table without creating intermediate table. This option has the potential of significantly improve performance and avoid memory limitations on big updates. | Optional |
| settings | A dictionary with custom settings for INSERT INTO and CREATE AS SELECT queries. | Optional |
Example Profile
Running Tests
Note: The only feature that is not supported and not tested is Ephemeral materialization.
Tests running command: pytest tests/integration
You can customize a few test params through environment variables. In order to provide custom params you’ll need to create test.env file under root (remember not to commit this file!) and define the following env variables inside:
Original Author
ClickHouse wants to thank @silentsokolov for creating this connector and for their valuable contributions.
clickhouse-client-pool 0.0.2
pip install clickhouse-client-pool Copy PIP instructions
Released: Mar 28, 2021
No project description provided
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Author: Eric Wang
Maintainer: Eric Wang
Maintainers
Classifiers
Project description
Table of Contents
Intro
A Naive Thread Safe clickhouse-client-pool based on clickhouse_driver.
Installation
clickhouse-client-pool is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 2.7/3.6+.
Installation
License
clickhouse-client-pool is distributed under the terms of
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
PerformanceВ¶
This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.
clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.
When you read data over HTTP you may need to cast strings into Python types.
Test dataВ¶
Sample data for testing is taken from ClickHouse docs.
Create database and table:
Download some data for 2017 year:
Insert data into ClickHouse:
Required packagesВ¶
For fast json parsing we’ll use ujson package:
VersionsВ¶
Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]
BenchmarkingВ¶
Scripts below can be benchmarked with following one-liner:
Time will measure:
Plain text without parsingВ¶
Let’s take get plain text response from ClickHouse server as baseline.
Fetching not parsed data with pure requests (1)
Parsed rowsВ¶
Line split into elements will be consider as “parsed” for TSV format (2)
Now we cast each element to it’s data type (2.5)
JSONEachRow format can be loaded with json loads (3)
Get fully parsed rows with clickhouse-driver in Native format (4)
Iteration over rowsВ¶
Iteration over TSV (5)
Now we cast each element to it’s data type (5.5)
Iteration over JSONEachRow (6)
Iteration over rows with clickhouse-driver in Native format (7)
Iteration over string rowsВ¶
OK, but what if we need only string columns?
Iteration over TSV (8)
Iteration over JSONEachRow (9)
Iteration over string rows with clickhouse-driver in Native format (10)
Iteration over int rowsВ¶
Iteration over TSV (11)
Iteration over JSONEachRow (12)
Iteration over int rows with clickhouse-driver in Native format (13)
ResultsВ¶
This table contains memory and timing benchmark results of snippets above.
JSON in table is shorthand for JSONEachRow.
| Rows | |||||
|---|---|---|---|---|---|
| 50k | 131k | 217k | 450k | 697k | |
| Plain text without parsing: timing | |||||
| Naive requests.get TSV (1) | 0.40 s | 0.67 s | 0.95 s | 1.67 s | 2.52 s |
| Naive requests.get JSON (1) | 0.61 s | 1.23 s | 2.09 s | 3.52 s | 5.20 s |
| Plain text without parsing: memory | |||||
| Naive requests.get TSV (1) | 49 MB | 107 MB | 165 MB | 322 MB | 488 MB |
| Naive requests.get JSON (1) | 206 MB | 564 MB | 916 MB | 1.83 GB | 2.83 GB |
| Parsed rows: timing | |||||
| requests.get TSV (2) | 0.81 s | 1.81 s | 3.09 s | 7.22 s | 11.87 s |
| requests.get TSV with cast (2.5) | 1.78 s | 4.58 s | 7.42 s | 16.12 s | 25.52 s |
| requests.get JSON (3) | 2.14 s | 5.65 s | 9.20 s | 20.43 s | 31.72 s |
| clickhouse-driver Native (4) | 0.73 s | 1.40 s | 2.08 s | 4.03 s | 6.20 s |
| Parsed rows: memory | |||||
| requests.get TSV (2) | 171 MB | 462 MB | 753 MB | 1.51 GB | 2.33 GB |
| requests.get TSV with cast (2.5) | 135 MB | 356 MB | 576 MB | 1.15 GB | 1.78 GB |
| requests.get JSON (3) | 139 MB | 366 MB | 591 MB | 1.18 GB | 1.82 GB |
| clickhouse-driver Native (4) | 135 MB | 337 MB | 535 MB | 1.05 GB | 1.62 GB |
| Iteration over rows: timing | |||||
| requests.get TSV (5) | 0.49 s | 0.99 s | 1.34 s | 2.58 s | 4.00 s |
| requests.get TSV with cast (5.5) | 1.38 s | 3.38 s | 5.40 s | 10.89 s | 16.59 s |
| requests.get JSON (6) | 1.89 s | 4.73 s | 7.63 s | 15.63 s | 24.60 s |
| clickhouse-driver Native (7) | 0.62 s | 1.28 s | 1.93 s | 3.68 s | 5.54 s |
| Iteration over rows: memory | |||||
| requests.get TSV (5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get TSV with cast (5.5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (6) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (7) | 56 MB | 70 MB | 71 MB | 71 MB | 71 MB |
| Iteration over string rows: timing | |||||
| requests.get TSV (8) | 0.40 s | 0.67 s | 0.80 s | 1.55 s | 2.18 s |
| requests.get JSON (9) | 1.14 s | 2.64 s | 4.22 s | 8.48 s | 12.96 s |
| clickhouse-driver Native (10) | 0.46 s | 0.91 s | 1.35 s | 2.49 s | 3.67 s |
| Iteration over string rows: memory | |||||
| requests.get TSV (8) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (9) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (10) | 46 MB | 56 MB | 57 MB | 57 MB | 57 MB |
| Iteration over int rows: timing | |||||
| requests.get TSV (11) | 0.84 s | 2.06 s | 3.22 s | 6.27 s | 10.06 s |
| requests.get JSON (12) | 0.95 s | 2.15 s | 3.55 s | 6.93 s | 10.82 s |
| clickhouse-driver Native (13) | 0.43 s | 0.61 s | 0.86 s | 1.53 s | 2.27 s |
| Iteration over int rows: memory | |||||
| requests.get TSV (11) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (12) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (13) | 41 MB | 48 MB | 48 MB | 48 MB | 49 MB |
ConclusionВ¶
If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.
It doesn’t matter which interface to use if you manipulate small amount of rows.
Блог об аналитике, визуализации данных, data science и BI
Дашборд первых 8 месяцев жизни малыша
Анализ рынка вакансий аналитики и BI: дашборд в Tableau
Анализ альбомов Земфиры: дашборд в Tableau
Гайд по современным BI-системам
Создаём материализованное представление в Clickhouse
Настройка машины
Наш скрипт на Python из предыдущих материалов необходимо подключить к Clickhouse — он будет отправлять запросы, поэтому нужно открыть несколько портов. В Dashboard AWS переходим в Network & Security — Security Groups. Наша машина входит в группу launch-wizard-1. Переходим в неё и смотрим на Inbound rules: нам нужно добавить правила как на скриншоте.
Настройка Clickhouse
Теперь настроим Clickhouse. Отредактируем файл config.xml в редакторе nano:
Воспользуйтесь мануалом по горячим клавишам, если тоже не сразу поняли, как выйти из nano.
чтобы доступ к базе данных был с любого IP-адреса:
Создание таблицы и материализованного представления
Зайдём в клиент и создадим нашу базу данных, в которой впоследствии создадим таблицы:
Мы проиллюстрируем всё тот же пример сбора данных с Facebook. Информация по кампаниям может часто обновляться, и мы, в целях упражнения, хотим создать материализованное представление, которое будет автоматически пересчитывать агрегаты на основе собранных данных по затратам. Таблица в Clickhouse будет практически такой же, как DataFrame из прошлого материала. В качестве движка таблицы используем ReplacingMergeTree : он будет удалять дубликаты по ключу сортировки:
И сразу создадим материализованное представление:
Подробности рецепта можно посмотреть в блоге Clickhouse.
Скрипт
Начнём писать скрипт. Понадобится новая библиотека — clickhouse_driver, позволяющая отправлять запросы к Clickhouse из скрипта на Python:
В материале приведена только доработка скрипта, описанного в статье «Собираем данные по рекламным кампаниям в Facebook». Всё будет работать, если вы просто вставите код из текущего материала в скрипт предыдущего.
Чтобы удостовериться, что всё нормально, можно написать следующий запрос, который должен вывести наименования всех баз данных на сервере:
В случае успеха получим на экране такой список:
Пусть, например, мы хотим рассматривать данные за последние три дня. Получим эти даты библиотекой datetime и переведём в нужный формат методом strftime() :
Напишем вот такой запрос, получающий все колонки таблицы за это время:
hatarist/clickhouse-cli
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
An unofficial command-line client for the ClickHouse DBMS. It implements some common and awesome things, such as:
But it works over the HTTP port, so there are some limitations for now:
Python 3.4+ is required.
/.clickhouse-cli.rc is here for your service!
The available environment variables are:
The order of precedence is:
Reading from file / stdin
Inserting the data from file
Oh boy. It’s a very dirty (and very untested) hack that lets you define your own functions or, actually, whatever you want, by running a find & replace operation over the query before sending the query to the server.
Say, you often run queries that parse some JSON, so you use visitParamExtractString all the time:
About
A third-party client for the Clickhouse DBMS server.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
PerformanceВ¶
This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.
clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.
When you read data over HTTP you may need to cast strings into Python types.
Test dataВ¶
Sample data for testing is taken from ClickHouse docs.
Create database and table:
Download some data for 2017 year:
Insert data into ClickHouse:
Required packagesВ¶
For fast json parsing we’ll use ujson package:
VersionsВ¶
Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]
BenchmarkingВ¶
Scripts below can be benchmarked with following one-liner:
Time will measure:
Plain text without parsingВ¶
Let’s take get plain text response from ClickHouse server as baseline.
Fetching not parsed data with pure requests (1)
Parsed rowsВ¶
Line split into elements will be consider as “parsed” for TSV format (2)
Now we cast each element to it’s data type (2.5)
JSONEachRow format can be loaded with json loads (3)
Get fully parsed rows with clickhouse-driver in Native format (4)
Iteration over rowsВ¶
Iteration over TSV (5)
Now we cast each element to it’s data type (5.5)
Iteration over JSONEachRow (6)
Iteration over rows with clickhouse-driver in Native format (7)
Iteration over string rowsВ¶
OK, but what if we need only string columns?
Iteration over TSV (8)
Iteration over JSONEachRow (9)
Iteration over string rows with clickhouse-driver in Native format (10)
Iteration over int rowsВ¶
Iteration over TSV (11)
Iteration over JSONEachRow (12)
Iteration over int rows with clickhouse-driver in Native format (13)
ResultsВ¶
This table contains memory and timing benchmark results of snippets above.
JSON in table is shorthand for JSONEachRow.
| Rows | |||||
|---|---|---|---|---|---|
| 50k | 131k | 217k | 450k | 697k | |
| Plain text without parsing: timing | |||||
| Naive requests.get TSV (1) | 0.40 s | 0.67 s | 0.95 s | 1.67 s | 2.52 s |
| Naive requests.get JSON (1) | 0.61 s | 1.23 s | 2.09 s | 3.52 s | 5.20 s |
| Plain text without parsing: memory | |||||
| Naive requests.get TSV (1) | 49 MB | 107 MB | 165 MB | 322 MB | 488 MB |
| Naive requests.get JSON (1) | 206 MB | 564 MB | 916 MB | 1.83 GB | 2.83 GB |
| Parsed rows: timing | |||||
| requests.get TSV (2) | 0.81 s | 1.81 s | 3.09 s | 7.22 s | 11.87 s |
| requests.get TSV with cast (2.5) | 1.78 s | 4.58 s | 7.42 s | 16.12 s | 25.52 s |
| requests.get JSON (3) | 2.14 s | 5.65 s | 9.20 s | 20.43 s | 31.72 s |
| clickhouse-driver Native (4) | 0.73 s | 1.40 s | 2.08 s | 4.03 s | 6.20 s |
| Parsed rows: memory | |||||
| requests.get TSV (2) | 171 MB | 462 MB | 753 MB | 1.51 GB | 2.33 GB |
| requests.get TSV with cast (2.5) | 135 MB | 356 MB | 576 MB | 1.15 GB | 1.78 GB |
| requests.get JSON (3) | 139 MB | 366 MB | 591 MB | 1.18 GB | 1.82 GB |
| clickhouse-driver Native (4) | 135 MB | 337 MB | 535 MB | 1.05 GB | 1.62 GB |
| Iteration over rows: timing | |||||
| requests.get TSV (5) | 0.49 s | 0.99 s | 1.34 s | 2.58 s | 4.00 s |
| requests.get TSV with cast (5.5) | 1.38 s | 3.38 s | 5.40 s | 10.89 s | 16.59 s |
| requests.get JSON (6) | 1.89 s | 4.73 s | 7.63 s | 15.63 s | 24.60 s |
| clickhouse-driver Native (7) | 0.62 s | 1.28 s | 1.93 s | 3.68 s | 5.54 s |
| Iteration over rows: memory | |||||
| requests.get TSV (5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get TSV with cast (5.5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (6) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (7) | 56 MB | 70 MB | 71 MB | 71 MB | 71 MB |
| Iteration over string rows: timing | |||||
| requests.get TSV (8) | 0.40 s | 0.67 s | 0.80 s | 1.55 s | 2.18 s |
| requests.get JSON (9) | 1.14 s | 2.64 s | 4.22 s | 8.48 s | 12.96 s |
| clickhouse-driver Native (10) | 0.46 s | 0.91 s | 1.35 s | 2.49 s | 3.67 s |
| Iteration over string rows: memory | |||||
| requests.get TSV (8) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (9) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (10) | 46 MB | 56 MB | 57 MB | 57 MB | 57 MB |
| Iteration over int rows: timing | |||||
| requests.get TSV (11) | 0.84 s | 2.06 s | 3.22 s | 6.27 s | 10.06 s |
| requests.get JSON (12) | 0.95 s | 2.15 s | 3.55 s | 6.93 s | 10.82 s |
| clickhouse-driver Native (13) | 0.43 s | 0.61 s | 0.86 s | 1.53 s | 2.27 s |
| Iteration over int rows: memory | |||||
| requests.get TSV (11) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
| requests.get JSON (12) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
| clickhouse-driver Native (13) | 41 MB | 48 MB | 48 MB | 48 MB | 49 MB |
ConclusionВ¶
If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.
It doesn’t matter which interface to use if you manipulate small amount of rows.
clickhouse-repl 1.0.0
pip install clickhouse-repl Copy PIP instructions
Released: Jan 19, 2021
A toolkit for running ClickHouse queries interactively, leveraging the perks of an ipython console
Navigation
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (MIT)
Author: klic.tools
Requires: Python >=3.7, wesleybatista
Classifiers
Project description
clickhouse-repl
A toolkit for running ClickHouse queries interactively, leveraging the perks of an ipython console
Installation
Use the package manager pip to install clickhouse-repl.
Usage
Connecting
Password prompted
If no environment variable is set, password will be prompted.
Password provided
Avoid this one!
Depending on the shell and settings in place, it is possible to bypass the recording to history by prefixing the command with double space
Password from Environment Variable
Connecting to specific database
Specify the database name and your session will start automatically from it.
Useful when your tables are somewhere else other than the ClickHouse default’s database and you don’t want to specify the database every time on your queries.
Running Queries
Using run_queries
Using client / c
These are shortcuts to clickhouse_driver.Client instance, initiated when a clickhouse-repl session is started.
You may use it for whatever purpose you may find.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
clickhouse-driver-fork-0-2-4 0.0.2
pip install clickhouse-driver-fork-0-2-4 Copy PIP instructions
Released: Aug 23, 2022
Fix of the version 0.2.4, for the clickhouse’s version 22.3
Navigation
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (This is the MIT license: http://www.opensource.org/licenses/mit-license.php Copyright (c) 2017 by Konstantin Lebedev. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the «Software»), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED «AS IS», WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.)
Maintainer: «>Carlos Yago
Requires: Python >=3.7
Maintainers
Classifiers
Project description
ClickHouse Python Driver
ClickHouse Python Driver with native (TCP) interface support.
Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch
Features
Documentation
Usage
There are two ways to communicate with server:
Pure Client example:
License
ClickHouse Python Driver is distributed under the MIT license.
Project details
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT License (This is the MIT license: http://www.opensource.org/licenses/mit-license.php Copyright (c) 2017 by Konstantin Lebedev. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the «Software»), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED «AS IS», WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.)
Maintainer: «>Carlos Yago
clickhouse_driver.errors.SocketTimeoutError #84
Comments
86085185 commented Apr 14, 2019
code:
from clickhouse_driver import Client
client = Client(‘xx.xxx.xx.xx’,port=8123,database=’default’,user=’default’,password=»)
client.execute(‘SHOW tables’)
error:
Traceback (most recent call last):
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/connection.py», line 232, in connect
self.receive_hello()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/connection.py», line 307, in receive_hello
packet_type = read_varint(self.fin)
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/reader.py», line 30, in read_varint
i = f.read_one()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/bufferedreader.py», line 48, in read_one
self.read_into_buffer()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/bufferedreader.py», line 143, in read_into_buffer
self.current_buffer_size = self.sock.recv_into(self.buffer)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File «», line 1, in
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/client.py», line 191, in execute
self.connection.force_connect()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/connection.py», line 166, in force_connect
self.connect()
File «/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/clickhouse_driver/connection.py», line 240, in connect
‘<> (<>)’.format(e.strerror, self.get_description())
clickhouse_driver.errors.SocketTimeoutError: Code: 209. None (39.96.218.165:8123)
check:
curl xx.xxx.xx.xx:8123
ok
jdbc connect
ok
datagrip connect
ok
The text was updated successfully, but these errors were encountered:
xzkostyan commented Apr 14, 2019
This driver use native protocol (port 9000). Port 8123 is used for HTTP protocol. Use 9000 port.
Python clickhouse driver
Table of Contents
Utility to import data into ClickHouse from MySQL (mainly) and/or CSV files
Requirements and Installation
Datareader can be installed either from github repo or from pypi repo.
Install dependencies. MySQL repo (for mysql-community-devel )
epel (for python3 )
clickhouse-client (for clickhouse-client ) from Packagecloud repo from packagecloud.io More details on installation are available on https://github.com/Altinity/clickhouse-rpm-install
and direct dependencies:
Install data reader
In case you’d like to play around with the sources this is the way to go.
MySQLdb package is used for communication with MySQL:
mysql-replication package is used for communication with MySQL also: https://github.com/noplay/python-mysql-replication
clickhouse-driver package is used for communication with ClickHouse: https://github.com/mymarilyn/clickhouse-driver
Clone sources from github
Also the following MySQL config options are required:
Expected results are:
Requirements and Limitations
Data reader understands INSERT SQL statements only. In practice this means that:
Operation General Schema
pypy significantly improves performance. You should try it. Really. Up to 10 times performance boost can be achieved. For example you can start with Portable PyPy distribution for Linux
Install required modules
mysqlclient may require to install libmysqlclient-dev and gcc
Install them if need be
Now you can run data reader via pypy
Let’s walk over test example of tool launch command line options. This code snippet is taken from shell script (see more details in airline.ontime Test Case)
MySQL is already configured as described earlier. Let’s migrate existing data to ClickHouse and listen for newly coming data in order to migrate them to CLickHouse on-the-fly.
Create ClickHouse table description
We have CREATE TABLE template stored in create_clickhouse_table_template.sql file.
Setup sharding field and primary key. These columns must not be Nullable
Create table in ClickHouse
Lock MySQL in order to avoid new data coming while data migration is running. Keep mysql client open during the whole process
This may take some time. Check all data is in ClickHouse
Start clickhouse-mysql as a replication slave, so it will listen for new data coming:
Replication will be pumping data from MySQL into ClickHouse in background and in some time we’ll see the following picture in ClickHouse:
Prepare tables templates in create_clickhouse.sql file
And create tables in ClickHouse
Pay attention to
Monitor logs for first row in replication notification of the following structure:
These records help us to create SQL statement for Data Migration process. Sure, we can peek into MySQL database manually in order to understand what records would be the last to be copied by migration process.
Pay attention to
Values for where clause in db.log_201801_1.sql are fetched from first row in replication log: INFO:first row in replication db.log_201801_1
airline.ontime Test Case
airline.ontime Data Set in CSV files
You may want to adjust dirs where to keep ZIP and CSV file
In airline_ontime_data_download.sh edit these lines:
You may want to adjust number of files to download (In case downloading all it may take some time).
Specify year and months range as you wish:
Downloading can take some time.
airline.ontime MySQL Table
airline.ontime ClickHouse Table
airline.ontime Data Reader
You may want to adjust PYTHON path and source and target hosts and usernames
airline.ontime Data Importer
You may want to adjust CSV files location, number of imported files and MySQL user/password used for import
Testing General Schema
MySQL Data Types
BIT the number of bits per value, from 1 to 64
Date and Time Types
CHAR The range of M is 0 to 255. If M is omitted, the length is 1.
VARCHAR The range of M is 0 to 65,535
BINARY similar to CHAR
VARBINARY similar to VARCHAR
TINYBLOB maximum length of 255
TINYTEXT maximum length of 255
BLOB maximum length of 65,535
TEXT maximum length of 65,535
MEDIUMBLOB maximum length of 16,777,215
MEDIUMTEXT maximum length of 16,777,215
LONGBLOB maximum length of 4,294,967,295 or 4GB
LONGTEXT maximum length of 4,294,967,295 or 4GB
ENUM can have a maximum of 65,535 distinct elements
SET can have a maximum of 64 distinct members
JSON native JSON data type defined by RFC 7159
ClickHouse Data Types
Date number of days since 1970-01-01
DateTime Unix timestamp
UInt32 0 4294967295
UInt64 0 18446744073709551615
FixedString(N) string of N bytes (not characters or code points)
String The length is not limited. The value can contain an arbitrary set of bytes, including null bytes
Date and Time Types
MySQL Test Tables
We have to separate test table into several ones because of this error, produced by MySQL:
Insert minimal acceptable values into the test table:
Insert maximum acceptable values into the test table:
clickhouse-migrations 0.3.0
pip install clickhouse-migrations Copy PIP instructions
Released: Aug 14, 2022
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: MIT
Tags clickhouse, migrations
Requires: Python >=3.6
Maintainers
Classifiers
Project description
Clickhouse Migrations
Clickhouse is known for its scale to store and fetch large datasets.
Development and Maintenance of large-scale db systems many times requires constant changes to the actual DB system. Holding off the scripts to migrate these will be painful.
Features:
Installation
Usage
In command line
In code
| Parameter | Description | Default |
|---|---|---|
| db_host | Clickhouse database hostname | localhost |
| db_user | Clickhouse uesr | **** |
| db_password | ***** | **** |
| db_name | Clickhouse database name | None |
| migrations_home | Path to list of migration files | |
| create_db_if_no_exists | If the db_name is not present, enabling this will create the db | True |
| multi_statement | Allow multiple statements in migration files | True |
Notes
The Clickhouse driver does not natively support executing multipe statements in a single query. To allow for multiple statements in a single migration, you can use the multi_statement param. There are two important caveats:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Supported typesВ¶
Each ClickHouse type is deserialized to a corresponding Python type when SELECT queries are prepared. When serializing INSERT queries, clickhouse-driver accepts a broader range of Python types. The following ClickHouse types are supported by clickhouse-driver:
[U]Int8/16/32/64В¶
Float32/64В¶
DateВ¶
DateTime(вЂtimezone’)В¶
Timezone support is new in version 0.0.11.
Integers are interpreted as seconds without timezone (UNIX timestamps). Integers can be used when insertion of datetime column is a bottleneck.
Setting use_client_time_zone is taken into consideration.
You can cast DateTime column to integers if you are facing performance issues when selecting large amount of rows.
String/FixedString(N)В¶
String column is encoded/decoded using UTF-8 encoding.
String column can be returned without decoding. Return values are bytes :
If a column has FixedString type, upon returning from SELECT it may contain trailing zeroes in accordance with ClickHouse’s storage format. Trailing zeroes are stripped by driver for convenience.
Enum8/16В¶
For Python 2.7 enum34 package is used.
Currently clickhouse-driver can’t handle empty enum value due to Python’s Enum mechanics. Enum member name must be not empty. See issue and workaround.
conda-forge/clickhouse-driver-feedstock
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Package license: MIT
Summary: Python driver with native interface for ClickHouse
Current build status
| Azure |
|
Current release info
Installing clickhouse-driver from the conda-forge channel can be achieved by adding conda-forge to your channels with:
Once the conda-forge channel has been enabled, clickhouse-driver can be installed with conda :
It is possible to list all of the versions of clickhouse-driver available on your platform with conda :
Alternatively, mamba repoquery may provide more information:
conda-forge is a community-led conda channel of installable packages. In order to provide high-quality builds, the process has been automated into the conda-forge GitHub organization. The conda-forge organization contains one repository for each of the installable packages. Such a repository is known as a feedstock.
A feedstock is made up of a conda recipe (the instructions on what and how to build the package) and the necessary configurations for automatic building using freely available continuous integration services. Thanks to the awesome service provided by Azure, GitHub, CircleCI, AppVeyor, Drone, and TravisCI it is possible to build and upload installable packages to the conda-forge Anaconda-Cloud channel for Linux, Windows and OSX respectively.
For more information please check the conda-forge documentation.
If you would like to improve the clickhouse-driver recipe or build a new package version, please fork this repository and submit a PR. Upon submission, your changes will be run on the appropriate platforms to give the reviewer an opportunity to confirm that the changes result in a successful build. Once merged, the recipe will be re-built and uploaded automatically to the conda-forge channel, whereupon the built conda packages will be available for everybody to install and use from the conda-forge channel. Note that all branches in the conda-forge/clickhouse-driver-feedstock are immediately built and any created packages are uploaded, so PRs should be based on branches in forks and branches in the main repository should only be used to build distinct package versions.
In order to produce a uniquely identifiable distribution:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
InstallationВ¶
Python VersionВ¶
Clickhouse-driver supports Python 3.4 and newer, Python 2.7, and PyPy.
Build DependenciesВ¶
Example for python:alpine docker image:
By default there are wheels for Linux, Mac OS X and Windows.
Packages for Linux and Mac OS X are available for python: 2.7, 3.4, 3.5, 3.6, 3.7, 3.8.
Packages for Windows are available for python: 2.7, 3.5, 3.6, 3.7, 3.8.
DependenciesВ¶
These distributions will be installed automatically when installing clickhouse-driver.
Optional dependenciesВ¶
These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.
Installation from PyPIВ¶
The package can be installed using pip :
You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:
You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:
Installation from githubВ¶
Development version can be installed directly from github:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
InstallationВ¶
Python VersionВ¶
Clickhouse-driver supports Python 3.4 and newer and PyPy.
Build DependenciesВ¶
Example for python:alpine docker image:
By default there are wheels for Linux, Mac OS X and Windows.
Packages for Linux and Mac OS X are available for python: 3.6 – 3.10.
Packages for Windows are available for python: 3.6 – 3.10.
Starting from version 0.2.3 there are wheels for musl-based Linux distributions.
DependenciesВ¶
These distributions will be installed automatically when installing clickhouse-driver.
Optional dependenciesВ¶
These distributions will not be installed automatically. Clickhouse-driver will detect and use them if you install them.
Installation from PyPIВ¶
The package can be installed using pip :
You can install extras packages if you need compression support. Example of LZ4 compression requirements installation:
You also can specify multiple extras by using comma. Install LZ4 and ZSTD requirements:
NumPy supportВ¶
You can install additional packages (NumPy and Pandas) if you need NumPy support:
NumPy supported versions are limited by numpy package python support.
Installation from githubВ¶
Development version can be installed directly from github:
infi.clickhouse-orm 0.5.1
pip install infi.clickhouse-orm==0.5.1 Copy PIP instructions
Released: Jun 28, 2016
A Python library for working with the ClickHouse database
Navigation
Project links
Statistics
View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery
License: Python Software Foundation License (PSF)
Maintainers
Classifiers
Project description
Overview
This project is simple ORM for working with the ClickHouse database. It allows you to define model classes whose instances can be written to the database and read from it.
Installation
To install infi.clickhouse_orm:
Usage
Defining Models
Models are defined in a way reminiscent of Django’s ORM:
It is possible to provide a default value for a field, instead of its “natural” default (empty string for string fields, zero for numeric fields etc.).
See below for the supported field types and table engines.
Using Models
Once you have a model, you can create model instances:
When values are assigned to model fields, they are immediately converted to their Pythonic data type. In case the value is invalid, a ValueError is raised:
Inserting to the Database
To write your instances to ClickHouse, you need a Database instance:
This automatically connects to http://localhost:8123 and creates a database called my_test_db, unless it already exists. If necessary, you can specify a different database URL and optional credentials:
Using the Database instance you can create a table for your model, and insert instances to it:
The insert method can take any iterable of model instances, but they all must belong to the same model class.
Reading from the Database
Loading model instances from the database is simple:
Do not include a FORMAT clause in the query, since the ORM automatically sets the format to TabSeparatedWithNamesAndTypes.
It is possible to select only a subset of the columns, and the rest will receive their default values:
Ad-Hoc Models
Specifying a model class is not required. In case you do not provide a model class, an ad-hoc class will be defined based on the column names and types returned by the query:
This is a very convenient feature that saves you the need to define a model for each query, while still letting you work with Pythonic column values and an elegant syntax.
Counting
The Database class also supports counting records easily:
Field Types
Currently the following field types are supported:
Table Engines
Each model must have an engine instance, used when creating the table in ClickHouse.
To define a MergeTree engine, supply the date column name and the names (or expressions) for the key columns:
You may also provide a sampling expression:
A CollapsingMergeTree engine is defined in a similar manner, but requires also a sign column:
For a SummingMergeTree you can optionally specify the summing columns:
Data Replication
Any of the above engines can be converted to a replicated engine (e.g. ReplicatedMergeTree) by adding two parameters, replica_table_path and replica_name:
Development
After cloning the project, run the following commands:
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provider two protocols for communication: HTTP protocol and Native (TCP) protocol.
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
ultram4rine/sqltools-clickhouse-driver
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
SQLTools ClickHouse Driver
ClickHouse driver for SQLTools VS Code extension.
After installation you will be able to explore tables and views, run queries, etc. For more details see SQLTools documentation.
Don’t use ; at the end of the query. Since that driver uses @apla/clickhouse library it automatically adds the FORMAT statement after query. In this case SQLTools thinks that you are sending multiple queries, which not supported (yet).
Use LIMIT when selecting from table which stores more than 100 000 (about) records.
clickhouse-driver
Python driver for ClickHouse
Navigation
Related Topics
Quick search
Welcome to clickhouse-driverВ¶
Welcome to clickhouse-driver’s documentation. Get started with Installation and then get an overview with the Quickstart where common queries are described.
User’s Guide¶
This part of the documentation focuses on step-by-step instructions for development with clickhouse-driver.
Clickhouse-driver is designed to communicate with ClickHouse server from Python over native protocol.
ClickHouse server provides two protocols for communication:
Each protocol has own advantages and disadvantages. Here we focus on advantages of native protocol:
Once again: clickhouse-driver uses native protocol (port 9000).
There is an asynchronous wrapper for clickhouse-driver: aioch. It’s available here.
API ReferenceВ¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
Additional NotesВ¶
Legal information, changelog and contributing are here for the interested.
Python clickhouse driver
Copy raw contents
ODBC Driver for ClickHouse
This is the official ODBC driver implementation for accessing ClickHouse as a data source.
For more information on ClickHouse go to ClickHouse home page.
For more information on what ODBC is go to ODBC Overview.
The canonical repo for this driver is located at https://github.com/ClickHouse/clickhouse-odbc.
See LICENSE file for licensing information.
Table of contents
Pre-built binary packages of the release versions of the driver available for the most common platforms at:
Note, that since ODBC drivers are not used directly by a user, but rather accessed through applications, which in their turn access the driver through ODBC driver manager, user have to install the driver for the same architecture (32- or 64-bit) as the application that is going to access the driver. Moreover, both the driver and the application must be compiled for (and actually use during run-time) the same ODBC driver manager implementation (we call them «ODBC providers» here). There are three supported ODBC providers:
If you have Homebrew installed (usually applicable to macOS only, but can also be available in Linux), just execute:
If you don’t see a package that matches your platforms under Releases, or the version of your system is significantly different than those of the available packages, or maybe you want to try a bleeding edge version of the code that hasn’t been released yet, you can always build the driver manually from sources:
Native packages will have all the dependency information so when you install the driver using a native package, all required run-time packages will be installed automatically. If you use manual packaging, i.e., just extract driver binaries to some folder, you also have to make sure that all the run-time dependencies are satisfied in your system manually:
The first step usually consists of registering the driver so that the corresponding ODBC provider is able to locate it.
The next step is defining one or more DSNs, associated with the newly registered driver, and setting driver-specific parameters in the body of those DSN definitions.
All this involves modifying a dedicated registry keys in case of MDAC, or editing odbcinst.ini (for driver registration) and odbc.ini (for DSN definition) files for UnixODBC or iODBC, directly or indirectly.
This will be performed automatically using some default values if you are installing the driver using native installers.
Otherwise, if you are configuring manually, or need to modify the default configuration created by the installer, please see the exact locations of files (or registry keys) that need to be modified in the corresponding section below:
The list of DSN parameters recognized by the driver is as follows:
URL query string
Some of configuration parameters can be passed to the server as a part of the query string of the URL.
The list of parameters in the query string of the URL that are also recognized by the driver is as follows:
| Parameter | Default value | Description |
|---|---|---|
| database | default | Database name to connect to |
| default_format | ODBCDriver2 | Default wire format of the resulting data that the server will send to the driver. Formats supported by the driver are: ODBCDriver2 and RowBinaryWithNamesAndTypes |
Note, that currently there is a difference in timezone handling between ODBCDriver2 and RowBinaryWithNamesAndTypes formats: in ODBCDriver2 date and time values are presented to the ODBC application in server’s timezone, wherease in RowBinaryWithNamesAndTypes they are converted to local timezone. This behavior will be changed/parametrized in future. If server and ODBC application timezones are the same, date and time values handling will effectively be identical between these two formats.
Troubleshooting: driver manager tracing and driver logging
To debug issues with the driver, first things that need to be done are:
Building from sources
The general requirements for building the driver from sources are as follows:
Additional requirements exist for each platform, which also depend on whether packaging and/or testing is performed.
See the exact steps for each platform in the corresponding section below:
The list of configuration options recognized during the CMake generation step is as follows:
Run-time dependencies: Windows
All modern Windows systems come with preinstalled MDAC driver manager.
Run-time dependencies: macOS
Execute the following in the terminal (assuming you have Homebrew installed):
Execute the following in the terminal (assuming you have Homebrew installed):
Run-time dependencies: Red Hat/CentOS
Execute the following in the terminal:
Execute the following in the terminal:
Run-time dependencies: Debian/Ubuntu
Execute the following in the terminal:
Execute the following in the terminal:
Configuration: MDAC/WDAC (Microsoft/Windows Data Access Components)
To configure already installed drivers and DSNs, or create new DSNs, use Microsoft ODBC Data Source Administrator tool:
For full description of ODBC configuration mechanism in Windows, as well as for the case when you want to learn how to manually register a driver and have a full control on configuration in general, see:
Note, that the keys are subject to «Registry Redirection» mechanism, with caveats.
You can find sample configuration for this driver here (just map the keys to corresponding sections in registry):
In short, usually you will end up editing /etc/odbcinst.ini and /etc/odbc.ini for system-wide driver and DSN entries, and
/.odbc.ini for user-wide driver and DSN entries.
For more info, see:
You can find sample configuration for this driver here:
These samples can be added to the corresponding configuration files using the odbcinst tool (assuming the package is installed under /usr/local ):
In short, usually you will end up editing /etc/odbcinst.ini and /etc/odbc.ini for system-wide driver and DSN entries, and
/.odbc.ini for user-wide driver and DSN entries.
In macOS, if those INI files exist, they usually are symbolic or hard links to /Library/ODBC/odbcinst.ini and /Library/ODBC/odbc.ini for system-wide, and
/Library/ODBC/odbc.ini for user-wide configs respectively.
For more info, see:
You can find sample configuration for this driver here:
Enabling driver manager tracing: MDAC/WDAC (Microsoft/Windows Data Access Components)
Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:
Enabling driver manager tracing: UnixODBC
Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:
Enabling driver manager tracing: iODBC
Comprehensive explanations (possibly, with some irrelevant vendor-specific details though) on how to enable ODBC driver manager tracing could be found at the following links:
Building from sources: Windows
CMake bundled with the recent versions of Visual Studio can be used.
An SDK required for building the ODBC driver is included in Windows SDK, which in its turn is also bundled with Visual Studio.
All of the following commands have to be issued in Visual Studio Command Prompt:
Clone the repo with submodules:
Enter the cloned source tree, create a temporary build folder, and generate the solution and project files in it:
Build the generated solution in-place:
. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):
Building from sources: macOS
You will need macOS 10.14 or later, Xcode 10 or later with Command Line Tools installed, as well as up-to-date Homebrew available in the system.
Install Homebrew using the following command, and follow the printed instructions on any additional steps required to complete the installation:
Then, install the latest Xcode from App Store. Open it at least once to accept the end-user license agreement and automatically install the required components.
Then, make sure that the latest Command Line Tools are installed and selected in the system:
Build-time dependencies: iODBC
Execute the following in the terminal:
Build-time dependencies: UnixODBC
Execute the following in the terminal:
Clone the repo recursively with submodules:
Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:
Build the generated solution in-place:
. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):
Building from sources: Red Hat/CentOS
Build-time dependencies: UnixODBC
Execute the following in the terminal:
Build-time dependencies: iODBC
Execute the following in the terminal:
All of the following commands have to be issued right after this one command issued in the same terminal session:
Clone the repo with submodules:
Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:
Build the generated solution in-place:
. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):
Building from sources: Debian/Ubuntu
Build-time dependencies: UnixODBC
Execute the following in the terminal:
Build-time dependencies: iODBC
Execute the following in the terminal:
Assuming, that the system cc and c++ are pointing to the compilers that satisfy the minimum requirements from Building from sources.
If the version of cmake is not recent enough, you can install a newer version by folowing instructions from one of these pages:
Clone the repo with submodules:
Enter the cloned source tree, create a temporary build folder, and generate a Makefile for the project in it:
Build the generated solution in-place:
. and, optionally, run tests (note, that for non-unit tests, preconfigured driver and DSN entries must exist, that point to the binaries generated in this build folder):
Источники:
- http://github.com/mymarilyn/clickhouse-driver/blob/master/docs/features.rst
- http://pypi.org/project/clickhouse-driver/
- http://medium.com/altinity/clickhouse-and-python-getting-to-know-the-clickhouse-driver-client-47d6c1c42b8f
- http://ivan-shamaev.ru/how-to-write-data-to-clickhouse-using-python/
- http://altinity.com/blog/clickhouse-and-python-getting-to-know-the-clickhouse-driver-client
- http://github.com/mymarilyn/clickhouse-driver/blob/master/docs/quickstart.rst
- http://clickhouse-driver.readthedocs.io/en/0.0.19/quickstart.html
- http://clickhouse-driver.readthedocs.io/en/0.2.3/api.html
- http://github.com/Immowelt/PyClickhouse
- http://github.com/ClickHouse/clickhouse-connect
- http://clickhouse-driver.readthedocs.io/en/0.1.3/features.html
- http://github.com/mymarilyn/clickhouse-driver/blob/master/docs/installation.rst
- http://github.com/long2ice/asynch
- http://pypi.org/project/aio-clickhouse/
- http://clickhouse-driver.readthedocs.io/en/0.1.0/index.html
- http://github.com/pavelmaksimov/clickhousepy
- http://github.com/maximdanilchenko/aiochclient
- http://github.com/mymarilyn/clickhouse-driver?ref=pythonrepo.com
- http://suilin.ru/post/clickhouse_driver/
- http://clickhouse-driver.readthedocs.io/en/0.1.4/api.html
- http://clickhouse-driver.readthedocs.io/en/0.1.5/features.html
- http://clickhouse-driver.readthedocs.io/en/0.2.1/api.html
- http://githubhelp.com/mymarilyn/clickhouse-driver
- http://clickhouse-driver.readthedocs.io/en/0.2.1/features.html
- http://clickhouse-driver.readthedocs.io/en/0.1.1/quickstart.html
- http://clickhouse-driver.readthedocs.io/en/0.2.0/quickstart.html
- http://pypi.org/project/aiochclient/
- http://pypi.org/project/airflow-clickhouse-plugin/
- http://clickhouse-driver.readthedocs.io/en/0.1.3/quickstart.html
- http://github.com/madiedinro/simple-clickhouse
- http://pypi.org/project/asynch/
- http://clickhouse-driver.readthedocs.io/en/0.1.0/quickstart.html
- http://pypi.org/project/clickhouse-sqlalchemy/
- http://clickhouse-driver.readthedocs.io/en/0.2.0/performance.html
- http://pypi.org/project/ClickSQL/
- http://github.com/mymarilyn/clickhouse-driver/blob/master/docs/types.rst
- http://github.com/ClickHouse/clickhouse-odbc
- http://clickhouse-driver.readthedocs.io/en/0.0.18/quickstart.html
- http://clickhouse-driver.readthedocs.io/en/0.0.18/api.html
- http://pypi.org/project/clickhouse-driver/0.2.4/
- http://github.com/Infinidat/infi.clickhouse_orm
- http://clickhouse-driver.readthedocs.io/en/0.1.4/quickstart.html
- http://clickhouse-driver.readthedocs.io/en/0.2.3/dbapi.html
- http://clickhouse-driver.readthedocs.io/en/0.2.1/dbapi.html
- http://clickhouse-driver.readthedocs.io/en/0.2.0/dbapi.html
- http://github.com/gavinln/clickhouse-test
- http://clickhouse-driver.readthedocs.io/en/0.2.2/quickstart.html
- http://clickhouse-driver.readthedocs.io/en/0.0.20/
- http://github.com/mymarilyn/clickhouse-driver/issues/32
- http://clickhouse-driver.readthedocs.io/en/0.2.3/development.html
- http://clickhouse-driver.readthedocs.io/en/0.2.1/installation.html
- http://clickhouse-driver.readthedocs.io/en/0.1.5/
- http://clickhouse-driver.readthedocs.io/en/0.2.3/performance.html
- http://snyk.io/advisor/python/clickhouse-driver
- http://clickhouse-driver.readthedocs.io/en/0.2.0/
- http://clickhouse-driver.readthedocs.io/en/0.2.0/installation.html
- http://github.com/ClickHouse/dbt-clickhouse
- http://clickhouse-driver.readthedocs.io/en/0.1.4/
- http://pythonrepo.com/repo/mymarilyn-clickhouse-driver-python-connecting-and-operating-databases
- http://github.com/mymarilyn/clickhouse-driver/issues/76
- http://clickhouse-driver.readthedocs.io/en/0.1.0/
- http://clickhouse-driver.readthedocs.io/en/0.0.17/
- http://pypi.org/project/clickhouse-http-client/
- http://clickhouse-driver.readthedocs.io/en/0.0.18/
- http://github.com/ppodolsky/clickhouse-python
- http://altinity.com/blog/2019/2/25/clickhouse-and-python-jupyter-notebooks
- http://github.com/whisklabs/airflow-clickhouse-plugin
- http://clickhouse-driver.readthedocs.io/en/0.0.19/
- http://medium.com/datatau/how-to-connect-to-clickhouse-with-python-using-sqlalchemy-760c8df8d753
- http://github.com/mymarilyn/aioch
- http://github.com/romario076/ClickHouseConnector
- http://pypi.org/project/dbt-clickhouse/
- http://pypi.org/project/clickhouse-client-pool/
- http://clickhouse-driver.readthedocs.io/en/0.1.4/performance.html
- http://leftjoin.ru/all/materialized-view-in-clickhouse/
- http://github.com/hatarist/clickhouse-cli
- http://clickhouse-driver.readthedocs.io/en/0.1.5/performance.html
- http://pypi.org/project/clickhouse-repl/
- http://pypi.org/project/clickhouse-driver-fork-0-2-4/
- http://github.com/mymarilyn/clickhouse-driver/issues/84
- http://github.com/Altinity/clickhouse-mysql-data-reader/blob/master/docs/manual.md
- http://pypi.org/project/clickhouse-migrations/
- http://clickhouse-driver.readthedocs.io/en/0.1.2/types.html
- http://github.com/conda-forge/clickhouse-driver-feedstock
- http://clickhouse-driver.readthedocs.io/en/0.1.4/installation.html
- http://clickhouse-driver.readthedocs.io/en/0.2.3/installation.html
- http://pypi.org/project/infi.clickhouse-orm/0.5.1/
- http://clickhouse-driver.readthedocs.io/en/0.1.5/index.html
- http://github.com/ultram4rine/sqltools-clickhouse-driver
- http://clickhouse-driver.readthedocs.io/en/0.2.3/
- http://github.com/ClickHouse/clickhouse-odbc/blob/master/README.md





















