Capture Postgres database packets through Wireshark on local machine

Statement : The sole purpose of this post is to install Wireshark on the Windows/Linux/MAC machine and analyse Postgres database packets on the same local machine.

Installation :

  • Install Wireshark from the link for Windows/MAC machine.
  • Follow the below steps to install the same on linux machine (Change your command based on distribution)-
  1.  To install -> sudo apt-get install wireshark
  2. To Run -> sudo  wireshark

Steps to capture packets :

  • As soon as Wireshark starts, it will show you the list of interfaces through which you want to capture the packet.
  • *So to begin with, as everything (Postgres setup) is installed on local machine, so you need to select the Lookback: lo0 interface as shown in the screenshot below –

Screen Shot 2018-02-20 at 1.26.00 PM.png

  • After selecting the appropriate interface apply the appropriate filter to capture the packets. In this case, I am gonna show you how to capture Postgres database packets by referring to the below screenshot.

Screen Shot 2018-02-20 at 1.31.13 PM.png

  • Now it’s time to analyse the packet. Look at the line 66 having info column as  <1/2/T/D/C/Z. This is the whole row which we get from the Postgres database and this process happens for each database with in this cycle. 1 stands for Parse completion, 2 stands for Bind completion, T stands for the Row Description which tells you the details of number of column having the internal information about the column schema and all (like OID – Object Identifier, column index, TYPE OID etc.) and D stands for the Data row by which you can see your exact data. Follow the below screenshot to get the help on this.

Screen Shot 2018-02-20 at 1.37.35 PM.png

  • C stands for the command completion and Z stand for the Query readiness.
  • In the same way, you can analyse any packets related to any traffic (tcp, udp, https likewise) and in turn you can get the basic understanding of how the packets travel across the network. You can dig deep into it to get the detailed information about the packets.  Even you can use the Fidder tool to do the same job.
  • Hope it works for you. Rock 🙂
Advertisements

Know your MAC machine’s IP through terminal

Statement : The purpose of this post is to get the MAC machine’s IP address.

Solutions : There are various ways to know your MAC machine’s IP address –

  • For Wirelessipconfig getifaddr en1
  • For Wired connection : ipconfig getifaddr en0
  • Other way is :
ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}' 
  • curl ifconfig.me
  • The common way is to execute the following command –

    ifconfig |grep inet

It will give you several rows but the row having the “inet” keyword is your desired IP of the machine. Hope it helps. 🙂

Error while loading shared libraries in CentOS using Docker Compose command

Statement : While running the Docker Compose command on my CentOS, I got this error saying “docker-compose: error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted”

Solution : You just need to remount the tmp directory with exec permission. Use the following command for the same –

sudo mount /tmp -o remount,exec

Hope this helps to resolve this issue. 🙂

Working with rclone to sync the remote machine files (AWS, Azure etc) with local machine

Statement : The sole purpose of this post is to learn how to keep in sync the remote data stored in AWS, Azure blob storage etc with the local file system.

Installation : Install rclone from the link based on your machine (Windows, Linux and MAC etc). I have worked on MAC so downloaded the respected file.

Steps : In my case, I have stored my files in Azure blob storage and AWS S3 bucket as well. So given below are the steps by which we can make the data in sync with the local directory.

  • Go to downloaded folder and execute the following command to configure rclone –

tangupta-mbp:rclone-v1.39-osx-amd64 tangupta$ ./rclone config

  • Initially there will be no remote found then you need to create the new remote.
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> remote
  • Now, It’ll ask for the type of storage like aws, azure, box, google drive etc to configure. I have chosen to use azure blog storage.
Storage> azureblob
  • Now it’ll ask for the details of azure blob storage like account name, key, end point (Keep it blank) etc.
Storage Account Name
account> your_created_account_name_on azure
Storage Account Key
key> generated_key_to_be_copied_through_azure_portal
Endpoint for the service - leave blank normally.
endpoint> 
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
  • To list all the contained created on Azure portal under this account name –
tangupta$./rclone lsd remote:

             -1 2018-02-05 12:37:03        -1 test

  • To list all the files uploaded or created under the container (test in my case) –
tangupta$./rclone ls remote:test

    90589 Gaurav.pdf

    48128 Resume shashank.doc

    26301 Resume_Shobhit.docx

    29366 Siddharth..docx

  • Most importantly, now use the below command to sync the local file system to the remote container, deleting any excess files in the container.

tangupta$./rclone sync /Users/tanuj/airflow/dag test

The Good thing about rclone sync is that it’ll download the updated content only. In the way, you can play with AWS storage to sync the file. Apart from all these commands, rclone has given us the facility to copy, move, delete commands to do the respective job in the appropriate way.

Hope this works for you. Enjoy 🙂

Working with Sqlite Database

Statement : The main purpose of this post is to learn how to install Sqlite on MAC machine and play with the basic commands.

Installation :

Use the following command to install sqlite –

brew install sqlite3

Frequently Used Sqlite Commands :

There are a few steps to see the tables in an SQLite database:

  1. Start the Sqlite database through command prompt –
    tangupta$ sqlite3
  2. Start with sqlite database file –
    sqlite>.open <Path of databse file>"/Users/tangupta/database.db"
  3. Get the full table content –
    SELECT * FROM tablename;
  4. List all of the available SQLite prompt commands –
    .help
  5. List the tables in your database –
    .tables
  6. Create table in the database –
    CREATE TABLE tableName (id integer, name text);
  7. Insert the data into the created table –
    INSERT INTO tableName VALUES (1, 'Tanuj');
  8. Add a column into the table –
     ALTER TABLEtableName ADD COLUMN isMarried char(1);
  9. Update column data into the table  –
    UPDATE users tableName (column1, column2) 
    = ('value1', 'value2') WHERE condition ;
  10. Exit from the sqlite database –
     .exit

Hope this helps you to get the basic understanding of sqlite database. Enjoy 🙂

Working in distributed mode with Airflow using Local and Celery Executor

Working with Local Executor:

LocalExecutor is widely used by the users in case they have moderate amounts of jobs to be executed. In this, worker picks the job and run locally via multiprocessing.

  1. Need to install PostgreSQL or MySql to support parallelism using any executor other then Sequential. Use the following command to do so –  $ brew install postgresql.
  2. Modify the configuration in AIRFLOW_HOME/airflow.cfg

Change the executor to Local Executor

executor = LocalExecutor

Change the meta db configuration

sql_alchemy_conn = postgresql+psycopg2://user_name:password@host_name/database_name

3. Restart airflow to test your dags

$ airflow initdb $ airflow webserver $ airflow scheduler

  1. Establish the db connections via the Airflow admin UI –
  • Go to the Airflow Admin UI: Admin -> Connection -> Create
Connection ID Name of your connection used to create task inside DAG
Connection Type
Postgres
Host Database server IP/localhost
Scheme Database_Name
Username User_Name
Password Password
  • Encrypt your credentials

Generate a valid Fernet key and place it into airflow.cfg

             FERNET_KEY=$(python -c “from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print FERNET_KEY”)

Working with Celery Executor:

CeleryExecutor is the best choice for the users in production when they have heavy amounts of jobs to be executed. In this, remote worker picks the job and runs as scheduled and load balanced.

  • Install and configure the message queuing/passing engine on the airflow server: RabbitMQ/Reddis/etc. –

                  1. Install RabbitMQ using $ brew install rabbitmq

2.  Add the following path to your .bash_profile or .profile – PATH=$PATH:/usr/local/sbin

  1. Start the RabbitMQ server using the following commands –
    $ sudo rabbitmq-server # run in foreground; or
                                        $ sudo rabbitmq-server -detached # run in background

  2. Configure RabbitMQ: create user and grant privileges
    $ rabbitmqctl add_user rabbitmq_user_name rabbitmq_password
                                          $ rabbitmqctl add_vhost rabbitmq_virtual_host_name
                                          $ rabbitmqctl set_user_tags rabbitmq_user_name rabbitmq_tag_name
                                          $ rabbitmqctl set_permissions -p rabbitmq_virtual_host_name rabbitmq_user_name “.” “.” “.*”

  3. Make the RabbitMQ server open to remote connections –
    Go to /usr/local/etc/rabbitmq/rabbitmq-env.conf, and change NODE_IP_ADDRESS from 127.0.0.1 to 0.0.0.0

  • Modify the configuration in AIRFLOW_HOME/airflow.cfg –        1. Change the executor to Celery Executor
    executor = CeleryExecutor
    2. Set up the RabbitMQ broker url and celery result backend
    broker_url = amqp://rabbitmq_user_name:rabbitmq_password@host_name/rabbitmq_virtual_host_name # host_name=localhost on server
    celery_result_backend = meta db url (as configured in step 2 of Phase 2), or RabbitMQ broker url (same as above), or any other eligible result backend
  • Open the meta DB (PostgreSQL) to remote connections
             1. Modify /usr/local/var/postgres/pg_hba.conf to add Client Authentication Record

    host    all         all         0.0.0.0/0          md5 # 0.0.0.0/0 stands for all ips; use CIDR address to restrict access; md5 for pwd authentication
    

              2. Change the Listen Address in /usr/local/var/postgres/postgresql.conf
                             listen_addresses = ‘*’
              3. Create a user and grant privileges (run the commands below under superuser of postgres)
                          $ CREATE USER your_postgres_user_name WITH ENCRYPTED PASSWORD ‘your_postgres_pwd’;

                           $ GRANT ALL PRIVILEGES ON DATABASE your_database_name TO your_postgres_user_name;
                           $ GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO your_postgres_user_name;

    4. Restart the PostgreSQL server and test it out.
                          $ brew services restart postgresql

                           $ psql -U [postgres_user_name] -h [postgres_host_name] -d [postgres_database_name]

  •  IMPORTANT: update your sql_alchemy_conn string in airflow.cfg
  • Start your airflow workers, on each worker, run: $ airflow worker.
  • Your airflow workers should be now picking up and running jobs from the airflow server.
  • Use the github link  to go through all the samples. Enjoy Coding 🙂

Passing and Accessing run time arguments to DAG Airflow through CLI:

  • One can pass run time arguments at the time of triggering the DAG using below command –
                   $ airflow trigger_dag dag_id --conf '{"key":"value" }'

  • Now, There are two ways in which one can access the parameters passed in airflow trigger_dag command –
  1. In the callable method defined in Operator, one can access the params as kwargs['dag_run'].conf.get('key')
  2. Given the field where you are using this thing is templatable field, one can use {{ dag_run.conf['key'] }}

Note* : The schedule_interval for the externally trigger-able DAG is set as None for the above approaches to work. Use the github link  to go through all the samples. Enjoy Coding