Featured

Working with rclone to sync the remote machine files (AWS, Azure etc) with local machine

Statement : The sole purpose of this post is to learn how to keep in sync the remote data stored in AWS, Azure blob storage etc with the local file system.

Installation : Install rclone from the link based on your machine (Windows, Linux and MAC etc). I have worked on MAC so downloaded the respected file.

Steps : In my case, I have stored my files in Azure blob storage and AWS S3 bucket as well. So given below are the steps by which we can make the data in sync with the local directory.

  • Go to downloaded folder and execute the following command to configure rclone –

tangupta-mbp:rclone-v1.39-osx-amd64 tangupta$ ./rclone config

  • Initially there will be no remote found then you need to create the new remote.
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> remote
  • Now, It’ll ask for the type of storage like aws, azure, box, google drive etc to configure. I have chosen to use azure blog storage.
Storage> azureblob
  • Now it’ll ask for the details of azure blob storage like account name, key, end point (Keep it blank) etc.
Storage Account Name
account> your_created_account_name_on azure
Storage Account Key
key> generated_key_to_be_copied_through_azure_portal
Endpoint for the service - leave blank normally.
endpoint> 
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
  • To list all the contained created on Azure portal under this account name –
tangupta$./rclone lsd remote:

             -1 2018-02-05 12:37:03        -1 test

  • To list all the files uploaded or created under the container (test in my case) –
tangupta$./rclone ls remote:test

    90589 Gaurav.pdf

    48128 Resume shashank.doc

    26301 Resume_Shobhit.docx

    29366 Siddharth..docx

  • Most importantly, now use the below command to sync the local file system to the remote container, deleting any excess files in the container.

tangupta$./rclone sync /Users/tanuj/airflow/dag test

The Good thing about rclone sync is that it’ll download the updated content only. In the way, you can play with AWS storage to sync the file. Apart from all these commands, rclone has given us the facility to copy, move, delete commands to do the respective job in the appropriate way.

Hope this works for you. Enjoy 🙂

Advertisements
Featured

Airflow Installation on MAC

Statement : The purpose of this post is to install Airflow on the MAC machine.

AIRFLOWAirflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. (Taken from Apache Airflow Official Page)

Installation Steps :

  •  Need to setup a home for airflow directory using the below command –
mkdir ~/Airflow 
export AIRFLOW_HOME=~/Airflow
  • As airflow is written in python. So first make sure that python is installed on the machine. If not, use the below command to install the python –
cd Airflow brew install python python3
  •  Now install airflow using pip (package management system used to install and manage software packages written in Python).
pip install airflow

Most probably, you would be getting some installation error which is given below using the above command –

“Found existing installation: six 1.4.1

DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.

Uninstalling six-1.4.1:” 

  • So to avoid this, use the below command to install the airflow successfully –
pip install --ignore-installed six airflow
# To install required packages based on the need 
pip install--ignore-installed six airflow[crypto] # For connection credentials security
pip install--ignore-installed six airflow[postgres] # For PostgreSQL Database
pip install--ignore-installed six airflow[celery] # For distributed mode: celery executor
pip install--ignore-installed six airflow[rabbitmq] # For message queuing and passing between airflow server and workers
  • Even after executing the above command, you would be getting some permission errors like “error: [Errno 13] Permission denied: ‘/usr/local/bin/mako-render”. So give permission to all those folders which are getting executed in the above command –
sudo chown -R $USER /Library/Python/2.7
sudo chown -R $USER /usr/local/bin/

Airflow uses a sqlite database which will be installed in parallel and create the necessary tables to check the status of DAG (Directed Acyclic Graph – is a collection of all the tasks you want to run, organised in a way that reflects their relationships and dependencies.) and other information related to this.

  •  Now as a last step we need to initialise the sqlite database using the below command-
airflow initdb
  •  Finally, everything is done and it’s time to start the web server to play with Airflow UI using the below command –
airflow webserver -p 8080

Enjoy Airflow in your flow 🙂 Use the github link  to go through all the samples. Enjoy Coding!!

 

Featured

Host your application on the Internet

Statement : The sole purpose of this post is to learn how to host your application to the Internet so that anyone can access it across the world.

Solution :

  • Sign up for the heroku account.
  • Download heroku cli to host you application from your local terminal.
  • Login to your account by using id and password through terminal by using below command –

heroku login

  • Create a new repo on your github account.
  • Now clone your repo on your local machine using the below command –

git clone https://github.com/guptakumartanuj/Cryptocurrency-Concierge.git

  • It’s time to develop your application. Once it is done, push your whole code to your github repo by using below commands –
  1. tangupta-mbp:Cryptocurrency-Concierge tangupta$ git add .
  2. tangupta-mbp:Cryptocurrency-Concierge tangupta$ git commit -m “First commit of cryptocurrency Concierge””
  3. tangupta-mbp:Cryptocurrency-Concierge tangupta$ git push
  • Now you are ready to crate a heroku app. Use the below command for the same –
cd ~/workingDir
$ heroku create
Creating app... done, ⬢ any-random-name
https://any-random-name.herokuapp.com/ | https://git.heroku.com/any-random-name.git
  • Now commit you application to heroku using the below command –

tangupta-mbp:Cryptocurrency-Concierge tangupta$ git push heroku master

  • It’s time to access your hosted application using the above highlighted url. But most probably you won’t be able to access the same. Make sure one instance of your hosted application is running. Use the below command to do the same –

heroku ps:scale web=1

  • In case, you are getting the below error while running the above command, then you need to make one file name Procfile with no extension and add the same to git repo. Then you need to push the repo to heroku again.

Scaling dynos… !

    Couldn’t find that process type.

  • In my case, to run my spring boot application, I have added the following command in the Procfile to run the application.

          web: java $JAVA_OPTS -Dserver.port=$PORT -jar target/*.war

  • Finally your application should be up and running. In case, you are facing any issues while pushing or running your application, you can check the heroku logs which will help you to troubleshoot the issue by using below commands-

heroku logs –tail

Enjoy coding and Happy Learning 🙂 

 

Featured

Redirect local IP (web application) to Internet (Public IP)

Statement : The purpose of this post is to host your application which is running locally to internet. In the other words, we can say that there is a requirement to redirect the local IP to Internet (Public IP).

Solution :

  1.  Download ngrok on your machine.
  2.  Let’s say, my application is running locally (localhost/127.0.0.1) on the port 8080 and I want to make it visible publicly so that other users can access it. Use the below command to get the public IP.

           tangupta-mbp:Downloads tangupta$ ./ngrok http 8080

In the output of the above command, you will get the below console –

ngrok by @inconshreveable                                                                                                      

Session Status                connecting                                                                                                     Version                       2.2.8                                                                                                                      Region                        United States (us)                                                                                               Web Interface                 http://127.0.0.1:4040           

Forwarding                    http://23b81bac.ngrok.io -> localhost:8080   

Forwarding                    https://23b81bac.ngrok.io -> localhost:8080

  1. Now, you will be able to access your application using the above highlighted http or https url.

Hope it works for you and fulfils your purpose of accessing your application publicly. Enjoy Learning 🙂

Featured

How to enable debugging through Eclipse/STS

This is the post excerpt.

  1. First Add below lines in php.ini –

;[XDebug]

;zend_extension = “C:\xampp\php\ext\php_xdebug.dll”

;xdebug.remote_enable = 1

;xdebug.remote_autostart=1

;xdebug.remote_host=localhost

;xdebug.remote_port=9000

semicolon (;) is used to comment the line.

  1. Now go to STS –

Right Click on Box Project -> Debug As -> Debug Configurations -> PhP Web Aplication -> New

Name it as Box_Integration or whatever you want –

In the Server Tab -> Php Server Configure -> Configure

Server Tab ->

Server Name : other.local-dev.creativesign.com (change yrs)

Base URL : http://other.local-dev.creativesign.com:447 (change yrs)

Document Root : Browse the root directory of the php project (My path – C:\xampp\htdocs\other.local-dev.creativesign.com)

Debugger Tab ->

Debugger : XDebug

Port : 9000

Path Mapping Tab ->

Path On Server :  C:\xampp\htdocs\other.local-dev.creativesign.com

Path in Wrokspace : /echosign-box-integration

Now Finish and come to main Server Tab .

In File : Give path of php page which you want to debug . /echosign-box-integration/src/Adobe/EchoSign/BoxBundle/Api/EchoSign.php

URL :   http://other.local-dev.creativesign.com:447/  map to /

Now Enjoy debugging.

Note* – If you are stuck at the 2nd line of app.php or app_dev.php while debuging, Go to preferences of IDE (Eclipse in my case), search debug. Click on the Debug of PHP, you can see that “Break at First line” is checked by default. You need to uncheck it. Hope now the problem will be solved.

Capture Postgres database packets through Wireshark on local machine

Statement : The sole purpose of this post is to install Wireshark on the Windows/Linux/MAC machine and analyse Postgres database packets on the same local machine.

Installation :

  • Install Wireshark from the link for Windows/MAC machine.
  • Follow the below steps to install the same on linux machine (Change your command based on distribution)-
  1.  To install -> sudo apt-get install wireshark
  2. To Run -> sudo  wireshark

Steps to capture packets :

  • As soon as Wireshark starts, it will show you the list of interfaces through which you want to capture the packet.
  • *So to begin with, as everything (Postgres setup) is installed on local machine, so you need to select the Lookback: lo0 interface as shown in the screenshot below –

Screen Shot 2018-02-20 at 1.26.00 PM.png

  • After selecting the appropriate interface apply the appropriate filter to capture the packets. In this case, I am gonna show you how to capture Postgres database packets by referring to the below screenshot.

Screen Shot 2018-02-20 at 1.31.13 PM.png

  • Now it’s time to analyse the packet. Look at the line 66 having info column as  <1/2/T/D/C/Z. This is the whole row which we get from the Postgres database and this process happens for each database with in this cycle. 1 stands for Parse completion, 2 stands for Bind completion, T stands for the Row Description which tells you the details of number of column having the internal information about the column schema and all (like OID – Object Identifier, column index, TYPE OID etc.) and D stands for the Data row by which you can see your exact data. Follow the below screenshot to get the help on this.

Screen Shot 2018-02-20 at 1.37.35 PM.png

  • C stands for the command completion and Z stand for the Query readiness.
  • In the same way, you can analyse any packets related to any traffic (tcp, udp, https likewise) and in turn you can get the basic understanding of how the packets travel across the network. You can dig deep into it to get the detailed information about the packets.  Even you can use the Fidder tool to do the same job.
  • Hope it works for you. Rock 🙂

Know your MAC machine’s IP through terminal

Statement : The purpose of this post is to get the MAC machine’s IP address.

Solutions : There are various ways to know your MAC machine’s IP address –

  • For Wirelessipconfig getifaddr en1
  • For Wired connection : ipconfig getifaddr en0
  • Other way is :
ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}' 
  • curl ifconfig.me
  • The common way is to execute the following command –

    ifconfig |grep inet

It will give you several rows but the row having the “inet” keyword is your desired IP of the machine. Hope it helps. 🙂

Error while loading shared libraries in CentOS using Docker Compose command

Statement : While running the Docker Compose command on my CentOS, I got this error saying “docker-compose: error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted”

Solution : You just need to remount the tmp directory with exec permission. Use the following command for the same –

sudo mount /tmp -o remount,exec

Hope this helps to resolve this issue. 🙂

Working with Sqlite Database

Statement : The main purpose of this post is to learn how to install Sqlite on MAC machine and play with the basic commands.

Installation :

Use the following command to install sqlite –

brew install sqlite3

Frequently Used Sqlite Commands :

There are a few steps to see the tables in an SQLite database:

  1. Start the Sqlite database through command prompt –
    tangupta$ sqlite3
  2. Start with sqlite database file –
    sqlite>.open <Path of databse file>"/Users/tangupta/database.db"
  3. Get the full table content –
    SELECT * FROM tablename;
  4. List all of the available SQLite prompt commands –
    .help
  5. List the tables in your database –
    .tables
  6. Create table in the database –
    CREATE TABLE tableName (id integer, name text);
  7. Insert the data into the created table –
    INSERT INTO tableName VALUES (1, 'Tanuj');
  8. Add a column into the table –
     ALTER TABLEtableName ADD COLUMN isMarried char(1);
  9. Update column data into the table  –
    UPDATE users tableName (column1, column2) 
    = ('value1', 'value2') WHERE condition ;
  10. Exit from the sqlite database –
     .exit

Hope this helps you to get the basic understanding of sqlite database. Enjoy 🙂

Working in distributed mode with Airflow using Local and Celery Executor

Working with Local Executor:

LocalExecutor is widely used by the users in case they have moderate amounts of jobs to be executed. In this, worker picks the job and run locally via multiprocessing.

  1. Need to install PostgreSQL or MySql to support parallelism using any executor other then Sequential. Use the following command to do so –  $ brew install postgresql.
  2. Modify the configuration in AIRFLOW_HOME/airflow.cfg

Change the executor to Local Executor

executor = LocalExecutor

Change the meta db configuration

sql_alchemy_conn = postgresql+psycopg2://user_name:password@host_name/database_name

3. Restart airflow to test your dags

$ airflow initdb $ airflow webserver $ airflow scheduler

  1. Establish the db connections via the Airflow admin UI –
  • Go to the Airflow Admin UI: Admin -> Connection -> Create
Connection ID Name of your connection used to create task inside DAG
Connection Type
Postgres
Host Database server IP/localhost
Scheme Database_Name
Username User_Name
Password Password
  • Encrypt your credentials

Generate a valid Fernet key and place it into airflow.cfg

             FERNET_KEY=$(python -c “from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print FERNET_KEY”)

Working with Celery Executor:

CeleryExecutor is the best choice for the users in production when they have heavy amounts of jobs to be executed. In this, remote worker picks the job and runs as scheduled and load balanced.

  • Install and configure the message queuing/passing engine on the airflow server: RabbitMQ/Reddis/etc. –

                  1. Install RabbitMQ using $ brew install rabbitmq

2.  Add the following path to your .bash_profile or .profile – PATH=$PATH:/usr/local/sbin

  1. Start the RabbitMQ server using the following commands –
    $ sudo rabbitmq-server # run in foreground; or
                                        $ sudo rabbitmq-server -detached # run in background

  2. Configure RabbitMQ: create user and grant privileges
    $ rabbitmqctl add_user rabbitmq_user_name rabbitmq_password
                                          $ rabbitmqctl add_vhost rabbitmq_virtual_host_name
                                          $ rabbitmqctl set_user_tags rabbitmq_user_name rabbitmq_tag_name
                                          $ rabbitmqctl set_permissions -p rabbitmq_virtual_host_name rabbitmq_user_name “.” “.” “.*”

  3. Make the RabbitMQ server open to remote connections –
    Go to /usr/local/etc/rabbitmq/rabbitmq-env.conf, and change NODE_IP_ADDRESS from 127.0.0.1 to 0.0.0.0

  • Modify the configuration in AIRFLOW_HOME/airflow.cfg –        1. Change the executor to Celery Executor
    executor = CeleryExecutor
    2. Set up the RabbitMQ broker url and celery result backend
    broker_url = amqp://rabbitmq_user_name:rabbitmq_password@host_name/rabbitmq_virtual_host_name # host_name=localhost on server
    celery_result_backend = meta db url (as configured in step 2 of Phase 2), or RabbitMQ broker url (same as above), or any other eligible result backend
  • Open the meta DB (PostgreSQL) to remote connections
             1. Modify /usr/local/var/postgres/pg_hba.conf to add Client Authentication Record

    host    all         all         0.0.0.0/0          md5 # 0.0.0.0/0 stands for all ips; use CIDR address to restrict access; md5 for pwd authentication
    

              2. Change the Listen Address in /usr/local/var/postgres/postgresql.conf
                             listen_addresses = ‘*’
              3. Create a user and grant privileges (run the commands below under superuser of postgres)
                          $ CREATE USER your_postgres_user_name WITH ENCRYPTED PASSWORD ‘your_postgres_pwd’;

                           $ GRANT ALL PRIVILEGES ON DATABASE your_database_name TO your_postgres_user_name;
                           $ GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO your_postgres_user_name;

    4. Restart the PostgreSQL server and test it out.
                          $ brew services restart postgresql

                           $ psql -U [postgres_user_name] -h [postgres_host_name] -d [postgres_database_name]

  •  IMPORTANT: update your sql_alchemy_conn string in airflow.cfg
  • Start your airflow workers, on each worker, run: $ airflow worker.
  • Your airflow workers should be now picking up and running jobs from the airflow server.
  • Use the github link  to go through all the samples. Enjoy Coding 🙂

Passing and Accessing run time arguments to DAG Airflow through CLI:

  • One can pass run time arguments at the time of triggering the DAG using below command –
                   $ airflow trigger_dag dag_id --conf '{"key":"value" }'

  • Now, There are two ways in which one can access the parameters passed in airflow trigger_dag command –
  1. In the callable method defined in Operator, one can access the params as kwargs['dag_run'].conf.get('key')
  2. Given the field where you are using this thing is templatable field, one can use {{ dag_run.conf['key'] }}

Note* : The schedule_interval for the externally trigger-able DAG is set as None for the above approaches to work. Use the github link  to go through all the samples. Enjoy Coding