Working with postgres on MAC

Statement : The sole purpose of this post is to learn how to install postgres database on MAC machine and basic commands to play with the DB.

postgres Installation :
                   $ brew install postgresql 

Connect to the database:
                    psql -h hostname -u username -p password-d databasename

Frequent list of commands to be used:
\l -> To list all the databases
\du ->TO list all the users/roles
\dt -> To show all tables in the working database
\q -> To quit the prompt

DDL AND DML commands:
$ CREATE DATABASE database_name;
$ SELECT * FROM table_name;
$ CREATE USER user_name WITH ENCRYPTED PASSWORD ‘passward’;
$ GRANT ALL PRIVILEGES ON DATABASE db_name TO user_name;
$ GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO user_name;
Restart the PostgreSQL server:
$ brew services restart postgresql

 

Advertisements

Install Redis on Mac

Statement : The sole purpose of this post is to install Redis on MAC machine. In addition to it, how to get and set the values through Redis.

Installation Command through HomeBrew:

$ brew install redis

  • Use the below python code to set the key value in Redis –

def cache_latest_records(ds, **kwargs):
redis_connection = redis.StrictRedis()
redis_connection.set(iso2, rate)

  • use the below command to check the cached results in cache –

$ echo ‘get “KEY”‘ | redis-cli

Finally, you will get the value of key which was set using redis connection. Hope it helps. Rock 🙂

Working with Apache Airflow, DAG, Sensor and XCom

Airflow Directory Structure:

AIRFLOW_HOME
├── airflow.cfg (This file contains Airflow’s default configuration. We can edit it to any  │                           setting related to executor, brokers etc)
├── airflow.db (This file contains information about database (SQLite DB by default)      │                            once airflow initialize the db)
├── dags <- DAGs directory (All the dags are kept inside this folders.)
│           └── test.py <- (test DAG python file. Ensure to compile the same before running    │           │                   the DAG)
│           └── test_first_operators.py <- (test first operator DAG python file. Ensure to          │                               compile the same before running the DAG)
├── plugins
│            └── first_operators.py <- (First Operator python file. Ensure to compile the │        │            │                  before running it)|
│            └── first_sensor.py <- (First Sensor python file. Ensure to compile the same          │                                    before running it)
└── unittests.cfg (This file contains the default configuration related to junit tests)

Steps to run the DAG and task:

  • As per the above directory structure, we just need to place the DAG file inside dags folder of AIRFLOW_HOME. Just make sure to compile the file successfully.
  • Now start the Airflow Scheduler by issuing the following command –                                $ airflow scheduler
  • Once
    the scheduler is started, it will send the task for execution based on
    defined executor in airflow config file. By default, tasks are scheduled
    by SequentialExecutor(This has nothing to do with concurrency). To achieve parallelism, one should either go with CeleryExecutor or MesosExecutor for robustness.
  • In
    order to start the DAG, go to Admin UI and turn on the DAG. Now either
    trigger the DAG by UI or use the below command to run the DAG -

# run your first task instance $ airflow run test task1 2018-01-20

                 # run a backfill over 2 days $ airflow backfill test -s 2018-01-21 -e 2018-01-22

Airflow UI to On and trigger the DAG:


In the above diagram, In the Recent Tasks column, first circle shows the number of success tasks, second circle shows number of running tasks and likewise for the failed, upstream_failed, up_for_retry and queues tasks. Same way in the DAG Runs column, first circle shows the number of success DAGS, second circle shows number of running DAGS and third shows number of failed DAGS.

State of Task:

 

TASKS_STATES


State of DAG:

 

DAG_STATE

DAG Detailed Graph View:

Steps to write your own Plugin:

  • Airflow has a simple plugin manager built-in that can integrate external features to its core by simply dropping files in your $AIRFLOW_HOME/plugins folder.
  • The python modules in the plugins folder get imported, and hooksoperatorsmacrosexecutors and web views get integrated to Airflow’s main collections and become available for use.
  • To create a plugin you will need to derive the airflow.plugins_manager.AirflowPlugin class.
  • Extend with SuperClass BaseOperatorBaseHookBaseExecutorBaseSensorOperator and BaseView to write your own operator, hook, executor, sensor and view respectively as a part of plugin.

Custom Airflow Operator:

  • An Operator is an atomic block of workflow logic, which performs a single action.
  • To create a custom Operator class, we define a sub class of BaseOperator.
  • Use the _init_()  function to initialize the settting for the given task.
  • Use execute() function to execute the desired task. Any value that the execute method returns is saved as an Xcom message under the key return_value.
  • To debug an operator install IPython library ($ pip install ipython) by placing IPython’s embed()command in your execute() method of an operator and Ariflow comes with “airflow test” command which you can use to manually start a single operator in the context of a specific DAG run.

$ airflow test test task1 2018-01-21

Custom Airflow Sensor:

  • It is a special type of Operator, typically used to monitor a long running task on another system.
  • Sonsor class is created by extending BaseSensorOperator
  • Use the _init_()  function to initialize the settting for the given task.
  • Use poke() function to execute the desired task over and over every poke_interval seconds until it returns True and if it returns False it will be called again.

XCom (Cross-Communication):

  • let tasks exchange messages, allowing more nuanced forms of control and shared state.
  • XComs are principally defined by a key, value, and timestamp.
  • XComs can be “pushed” (sent) using xcom_push() functionor “pulled” (received) using xcom_pull() function. The information passed using Xcoms will be pickled and stored in the Airflow database (xcom table), so it’s better to save only small bits of information, rather then large objects.
    task_instance.xcom_push(‘key1’, value1)
                               value = task_instance.xcom_pull(‘task’, key=‘key1)

Passing and Accessing run time arguments to Airflow through CLI:

  • One can pass run time arguments at the time of triggering the DAG using below command –
    $ airflow trigger_dag dag_id --conf '{"key":"value" }'
  • Now, There are two ways in which one can access the parameters passed in airflow trigger_dag command –
  1. In the callable method defined in Operator, one can access the params as kwargs['dag_run'].conf.get('key')
  2. Given the field where you are using this thing is templatable field, one can use {{ dag_run.conf['key'] }}

Note* : The schedule_interval for the externally trigger-able DAG is set as None for the above approaches to work. Use the github link  to go through all the samples. Enjoy Coding 🙂

Airflow Installation on MAC

Statement : The purpose of this post is to install Airflow on the MAC machine.

AIRFLOWAirflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. (Taken from Apache Airflow Official Page)

Installation Steps :

  •  Need to setup a home for airflow directory using the below command –
mkdir ~/Airflow 
export AIRFLOW_HOME=~/Airflow
  • As airflow is written in python. So first make sure that python is installed on the machine. If not, use the below command to install the python –
cd Airflow brew install python python3
  •  Now install airflow using pip (package management system used to install and manage software packages written in Python).
pip install airflow

Most probably, you would be getting some installation error which is given below using the above command –

“Found existing installation: six 1.4.1

DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.

Uninstalling six-1.4.1:” 

  • So to avoid this, use the below command to install the airflow successfully –
pip install --ignore-installed six airflow
# To install required packages based on the need 
pip install--ignore-installed six airflow[crypto] # For connection credentials security
pip install--ignore-installed six airflow[postgres] # For PostgreSQL Database
pip install--ignore-installed six airflow[celery] # For distributed mode: celery executor
pip install--ignore-installed six airflow[rabbitmq] # For message queuing and passing between airflow server and workers
  • Even after executing the above command, you would be getting some permission errors like “error: [Errno 13] Permission denied: ‘/usr/local/bin/mako-render”. So give permission to all those folders which are getting executed in the above command –
sudo chown -R $USER /Library/Python/2.7
sudo chown -R $USER /usr/local/bin/

Airflow uses a sqlite database which will be installed in parallel and create the necessary tables to check the status of DAG (Directed Acyclic Graph – is a collection of all the tasks you want to run, organised in a way that reflects their relationships and dependencies.) and other information related to this.

  •  Now as a last step we need to initialise the sqlite database using the below command-
airflow initdb
  •  Finally, everything is done and it’s time to start the web server to play with Airflow UI using the below command –
airflow webserver -p 8080

Enjoy Airflow in your flow 🙂 Use the github link  to go through all the samples. Enjoy Coding!!

 

Host your application on the Internet

Statement : The sole purpose of this post is to learn how to host your application to the Internet so that anyone can access it across the world.

Solution :

  • Sign up for the heroku account.
  • Download heroku cli to host you application from your local terminal.
  • Login to your account by using id and password through terminal by using below command –

heroku login

  • Create a new repo on your github account.
  • Now clone your repo on your local machine using the below command –

git clone https://github.com/guptakumartanuj/Cryptocurrency-Concierge.git

  • It’s time to develop your application. Once it is done, push your whole code to your github repo by using below commands –
  1. tangupta-mbp:Cryptocurrency-Concierge tangupta$ git add .
  2. tangupta-mbp:Cryptocurrency-Concierge tangupta$ git commit -m “First commit of cryptocurrency Concierge””
  3. tangupta-mbp:Cryptocurrency-Concierge tangupta$ git push
  • Now you are ready to crate a heroku app. Use the below command for the same –
cd ~/workingDir
$ heroku create
Creating app... done, ⬢ any-random-name
https://any-random-name.herokuapp.com/ | https://git.heroku.com/any-random-name.git
  • Now commit you application to heroku using the below command –

tangupta-mbp:Cryptocurrency-Concierge tangupta$ git push heroku master

  • It’s time to access your hosted application using the above highlighted url. But most probably you won’t be able to access the same. Make sure one instance of your hosted application is running. Use the below command to do the same –

heroku ps:scale web=1

  • In case, you are getting the below error while running the above command, then you need to make one file name Procfile with no extension and add the same to git repo. Then you need to push the repo to heroku again.

Scaling dynos… !

    Couldn’t find that process type.

  • In my case, to run my spring boot application, I have added the following command in the Procfile to run the application.

          web: java $JAVA_OPTS -Dserver.port=$PORT -jar target/*.war

  • Finally your application should be up and running. In case, you are facing any issues while pushing or running your application, you can check the heroku logs which will help you to troubleshoot the issue by using below commands-

heroku logs –tail

Enjoy coding and Happy Learning 🙂 

 

Redirect local IP (web application) to Internet (Public IP)

Statement : The purpose of this post is to host your application which is running locally to internet. In the other words, we can say that there is a requirement to redirect the local IP to Internet (Public IP).

Solution :

  1.  Download ngrok on your machine.
  2.  Let’s say, my application is running locally (localhost/127.0.0.1) on the port 8080 and I want to make it visible publicly so that other users can access it. Use the below command to get the public IP.

           tangupta-mbp:Downloads tangupta$ ./ngrok http 8080

In the output of the above command, you will get the below console –

ngrok by @inconshreveable                                                                                                      

Session Status                connecting                                                                                                     Version                       2.2.8                                                                                                                      Region                        United States (us)                                                                                               Web Interface                 http://127.0.0.1:4040           

Forwarding                    http://23b81bac.ngrok.io -> localhost:8080   

Forwarding                    https://23b81bac.ngrok.io -> localhost:8080

  1. Now, you will be able to access your application using the above highlighted http or https url.

Hope it works for you and fulfils your purpose of accessing your application publicly. Enjoy Learning 🙂

Pagination Support in DocumentDb using java SDK API

Statement : The sole purpose of this post is to implement the pagination in your application using Document Db java SDK API.

Solution :

  1. First you need to create the object of the FeedOptions by setting the page size (in my case I have taken it as 10) using the below snippet –

final FeedOptions feedOptions = new FeedOptions();

        feedOptions.setPageSize(10);

  1. Now you need to get the data using  queryDocuments() api of JAVA SDK passing the above feedOptions in this api’s arguments –

FeedResponse<Document> feedResults = documentClient.queryDocuments(collectionLink, queryString, feedOptions);

  1. This is the main step which helps you to get the page response as per the set limit of the page. Now to make it happen, continuation token comes into the picture. Use the below snippet to get the continuation token which can be used for pagination for the future calls given below –

String continuationToken = feedResults.getResponseContinuation();

  1. So, by using the above token we can make the next call –

List<Document> finalRes = new ArrayList<Document>();

boolean nextFlag= false;

if (feedResults != null) {

                if ((docs = feedResults.getQueryIterable().fetchNextBlock()) != null ) { 

                    for (Document doc : docs) {

                        finalRes.add(doc.toJson());

                    }

                    if (feedResults.getQueryIterable().iterator().hasNext()) {

                        nextFlag = true;

                    }

                }

            }

I hope this helps you to add the support of pagination in your application. Keep in mind, this only works on the single partition. In case of cross partition, continuation token does not work as it has the range component and in turn it will return you the null token in case of cross partition query. Enjoy coding 🙂