Error while loading shared libraries in CentOS using Docker Compose command

Statement : While running the Docker Compose command on my CentOS, I got this error saying “docker-compose: error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted”

Solution : You just need to remount the tmp directory with exec permission. Use the following command for the same –

sudo mount /tmp -o remount,exec

Hope this helps to resolve this issue. 🙂

Advertisements

Working with rclone to sync the remote machine files (AWS, Azure etc) with local machine

Statement : The sole purpose of this post is to learn how to keep in sync the remote data stored in AWS, Azure blob storage etc with the local file system.

Installation : Install rclone from the link based on your machine (Windows, Linux and MAC etc). I have worked on MAC so downloaded the respected file.

Steps : In my case, I have stored my files in Azure blob storage and AWS S3 bucket as well. So given below are the steps by which we can make the data in sync with the local directory.

  • Go to downloaded folder and execute the following command to configure rclone –

tangupta-mbp:rclone-v1.39-osx-amd64 tangupta$ ./rclone config

  • Initially there will be no remote found then you need to create the new remote.
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> remote
  • Now, It’ll ask for the type of storage like aws, azure, box, google drive etc to configure. I have chosen to use azure blog storage.
Storage> azureblob
  • Now it’ll ask for the details of azure blob storage like account name, key, end point (Keep it blank) etc.
Storage Account Name
account> your_created_account_name_on azure
Storage Account Key
key> generated_key_to_be_copied_through_azure_portal
Endpoint for the service - leave blank normally.
endpoint> 
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
  • To list all the contained created on Azure portal under this account name –
tangupta$./rclone lsd remote:

             -1 2018-02-05 12:37:03        -1 test

  • To list all the files uploaded or created under the container (test in my case) –
tangupta$./rclone ls remote:test

    90589 Gaurav.pdf

    48128 Resume shashank.doc

    26301 Resume_Shobhit.docx

    29366 Siddharth..docx

  • To Copy all the files uploaded or created under the container to the local machine or vice versa  –

tangupta$./rclone copy /Users/tanuj/airflow/dag remote:test

  • Most importantly, now use the below command to sync the local file system to the remote container, deleting any excess files in the container.

tangupta$./rclone sync /Users/tanuj/airflow/dag remote:test

The Good thing about rclone sync is that it’ll download the updated content only. In the way, you can play with AWS storage to sync the file. Apart from all these commands, rclone has given us the facility to copy, move, delete commands to do the respective job in the appropriate way.

Now, one can use the rsync command to copy/sync/backup the contents between different directories locally and remotely as well. It is widely used command to transfer the partial transfer (difference of data in files) between source and destination node.

tangupta$ rsync --avc --delete /Users/tanuj/airflow/test /Users/tanuj/airflow/dags

Hope this works for you. Enjoy 🙂

Working with Sqlite Database

Statement : The main purpose of this post is to learn how to install Sqlite on MAC machine and play with the basic commands.

Installation :

Use the following command to install sqlite –

brew install sqlite3

Frequently Used Sqlite Commands :

There are a few steps to see the tables in an SQLite database:

  1. Start the Sqlite database through command prompt –
    tangupta$ sqlite3
  2. Start with sqlite database file –
    sqlite>.open <Path of databse file>"/Users/tangupta/database.db"
  3. Get the full table content –
    SELECT * FROM tablename;
  4. List all of the available SQLite prompt commands –
    .help
  5. List the tables in your database –
    .tables
  6. Create table in the database –
    CREATE TABLE tableName (id integer, name text);
  7. Insert the data into the created table –
    INSERT INTO tableName VALUES (1, 'Tanuj');
  8. Add a column into the table –
     ALTER TABLEtableName ADD COLUMN isMarried char(1);
  9. Update column data into the table  –
    UPDATE users tableName (column1, column2) 
    = ('value1', 'value2') WHERE condition ;
  10. Exit from the sqlite database –
     .exit

Hope this helps you to get the basic understanding of sqlite database. Enjoy 🙂

Working in distributed mode with Airflow using Local and Celery Executor

Working with Local Executor:

LocalExecutor is widely used by the users in case they have moderate amounts of jobs to be executed. In this, worker picks the job and run locally via multiprocessing.

  1. Need to install PostgreSQL or MySql to support parallelism using any executor other then Sequential. Use the following command to do so –  $ brew install postgresql.
  2. Modify the configuration in AIRFLOW_HOME/airflow.cfg

Change the executor to Local Executor

executor = LocalExecutor

Change the meta db configuration

sql_alchemy_conn = postgresql+psycopg2://user_name:password@host_name/database_name

3. Restart airflow to test your dags

$ airflow initdb $ airflow webserver $ airflow scheduler

  1. Establish the db connections via the Airflow admin UI –
  • Go to the Airflow Admin UI: Admin -> Connection -> Create
Connection ID Name of your connection used to create task inside DAG
Connection Type
Postgres
Host Database server IP/localhost
Scheme Database_Name
Username User_Name
Password Password
  • Encrypt your credentials

Generate a valid Fernet key and place it into airflow.cfg

             FERNET_KEY=$(python -c “from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print FERNET_KEY”)

Working with Celery Executor:

CeleryExecutor is the best choice for the users in production when they have heavy amounts of jobs to be executed. In this, remote worker picks the job and runs as scheduled and load balanced.

  • Install and configure the message queuing/passing engine on the airflow server: RabbitMQ/Reddis/etc. –

                  1. Install RabbitMQ using $ brew install rabbitmq

2.  Add the following path to your .bash_profile or .profile – PATH=$PATH:/usr/local/sbin

  1. Start the RabbitMQ server using the following commands –
    $ sudo rabbitmq-server # run in foreground; or
                                        $ sudo rabbitmq-server -detached # run in background

  2. Configure RabbitMQ: create user and grant privileges
    $ rabbitmqctl add_user rabbitmq_user_name rabbitmq_password
                                          $ rabbitmqctl add_vhost rabbitmq_virtual_host_name
                                          $ rabbitmqctl set_user_tags rabbitmq_user_name rabbitmq_tag_name
                                          $ rabbitmqctl set_permissions -p rabbitmq_virtual_host_name rabbitmq_user_name “.” “.” “.*”

  3. Make the RabbitMQ server open to remote connections –
    Go to /usr/local/etc/rabbitmq/rabbitmq-env.conf, and change NODE_IP_ADDRESS from 127.0.0.1 to 0.0.0.0

  • Modify the configuration in AIRFLOW_HOME/airflow.cfg –        1. Change the executor to Celery Executor
    executor = CeleryExecutor
    2. Set up the RabbitMQ broker url and celery result backend
    broker_url = amqp://rabbitmq_user_name:rabbitmq_password@host_name/rabbitmq_virtual_host_name # host_name=localhost on server
    celery_result_backend = meta db url (as configured in step 2 of Phase 2), or RabbitMQ broker url (same as above), or any other eligible result backend
  • Open the meta DB (PostgreSQL) to remote connections
             1. Modify /usr/local/var/postgres/pg_hba.conf to add Client Authentication Record

    host    all         all         0.0.0.0/0          md5 # 0.0.0.0/0 stands for all ips; use CIDR address to restrict access; md5 for pwd authentication
    

              2. Change the Listen Address in /usr/local/var/postgres/postgresql.conf
                             listen_addresses = ‘*’
              3. Create a user and grant privileges (run the commands below under superuser of postgres)
                          $ CREATE USER your_postgres_user_name WITH ENCRYPTED PASSWORD ‘your_postgres_pwd’;

                           $ GRANT ALL PRIVILEGES ON DATABASE your_database_name TO your_postgres_user_name;
                           $ GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO your_postgres_user_name;

    4. Restart the PostgreSQL server and test it out.
                          $ brew services restart postgresql

                           $ psql -U [postgres_user_name] -h [postgres_host_name] -d [postgres_database_name]

  •  IMPORTANT: update your sql_alchemy_conn string in airflow.cfg
  • Start your airflow workers, on each worker, run: $ airflow worker.
  • Your airflow workers should be now picking up and running jobs from the airflow server.
  • Use the github link  to go through all the samples. Enjoy Coding 🙂

Passing and Accessing run time arguments to DAG Airflow through CLI:

  • One can pass run time arguments at the time of triggering the DAG using below command –
                   $ airflow trigger_dag dag_id --conf '{"key":"value" }'

  • Now, There are two ways in which one can access the parameters passed in airflow trigger_dag command –
  1. In the callable method defined in Operator, one can access the params as kwargs['dag_run'].conf.get('key')
  2. Given the field where you are using this thing is templatable field, one can use {{ dag_run.conf['key'] }}

Note* : The schedule_interval for the externally trigger-able DAG is set as None for the above approaches to work. Use the github link  to go through all the samples. Enjoy Coding

Working with postgres on MAC

Statement : The sole purpose of this post is to learn how to install postgres database on MAC machine and basic commands to play with the DB.

postgres Installation :
                   $ brew install postgresql 

Connect to the database:
                    psql -h hostname -u username -p password-d databasename

Frequent list of commands to be used:
\l -> To list all the databases
\du ->TO list all the users/roles
\dt -> To show all tables in the working database
\q -> To quit the prompt

DDL AND DML commands:
$ CREATE DATABASE database_name;
$ SELECT * FROM table_name;
$ CREATE USER user_name WITH ENCRYPTED PASSWORD ‘passward’;
$ GRANT ALL PRIVILEGES ON DATABASE db_name TO user_name;
$ GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO user_name;
Restart the PostgreSQL server:
$ brew services restart postgresql

 

Install Redis on Mac

Statement : The sole purpose of this post is to install Redis on MAC machine. In addition to it, how to get and set the values through Redis.

Installation Command through HomeBrew:

$ brew install redis

  • Use the below python code to set the key value in Redis –

def cache_latest_records(ds, **kwargs):
redis_connection = redis.StrictRedis()
redis_connection.set(iso2, rate)

  • use the below command to check the cached results in cache –

$ echo ‘get “KEY”‘ | redis-cli

Finally, you will get the value of key which was set using redis connection. Hope it helps. Rock 🙂