Hadoop Learning – Day 1

Today is my day with Hadoop Self Learning Log Day1 . Started with preinstalled Hadoop VM from Cloudera  for Virtual Box:


Followed Instruction and created a VM from downloaded VHD from above link.

Once system  was up and running, Started watching Hadoop Tutorial: Intro to HDFS provided by  http://marakana.com/ on YouTube

[cloudera@localhost ~]$ hadoop

Usage: hadoop [–config confdir] COMMAND

      where COMMAND is one of:

 fs                   run a generic filesystem user client

 version              print the version

 jar <jar>            run a jar file

 distcp <srcurl> <desturl> copy file or directories recursively

 archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive

 classpath            prints the class path needed to get the

                      Hadoop jar and the required libraries

 daemonlog            get/set the log level for each daemon


 CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

[cloudera@localhost ~]$ hadoop -fs

Error: No command named `-fs’ was found. Perhaps you meant `hadoop fs’

[cloudera@localhost ~]$ hadoop fs

Usage: hadoop fs [generic options]

   [-cat [-ignoreCrc] <src> …]

   [-chgrp [-R] GROUP PATH…]

   [-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]

   [-chown [-R] [OWNER][:[GROUP]] PATH…]

   [-copyFromLocal <localsrc> … <dst>]

   [-copyToLocal [-ignoreCrc] [-crc] <src> … <localdst>]

   [-count [-q] <path> …]

   [-cp <src> … <dst>]

   [-df [-h] [<path> …]]

   [-du [-s] [-h] <path> …]


   [-get [-ignoreCrc] [-crc] <src> … <localdst>]

   [-getmerge [-nl] <src> <localdst>]

   [-help [cmd …]]

   [-ls [-d] [-h] [-R] [<path> …]]

   [-mkdir [-p] <path> …]

   [-moveFromLocal <localsrc> … <dst>]

   [-moveToLocal <src> <localdst>]

   [-mv <src> … <dst>]

   [-put <localsrc> … <dst>]

   [-rm [-f] [-r|-R] [-skipTrash] <src> …]

   [-rmdir [–ignore-fail-on-non-empty] <dir> …]

   [-setrep [-R] [-w] <rep> <path/file> …]

   [-stat [format] <path> …]

   [-tail [-f] <file>]

   [-test -[ezd] <path>]

   [-text [-ignoreCrc] <src> …]

   [-touchz <path> …]

   [-usage [cmd …]]

Generic options supported are

-conf <configuration file>     specify an application configuration file

-D <property=value>            use value for given property

-fs <local|namenode:port>      specify a namenode

-jt <local|jobtracker:port>    specify a job tracker

-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster

-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.

-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

[cloudera@localhost ~]$ hadoop fs -ls /

Found 4 items

drwxr-xr-x   – hbase  supergroup          0 2013-04-17 04:17 /hbase

drwxrwxrwt   – mapred supergroup          0 2013-04-16 19:09 /tmp

drwxr-xr-x   – hdfs   supergroup          0 2013-04-16 19:13 /user

drwxr-xr-x   – yarn   supergroup          0 2013-04-16 19:13 /var

[cloudera@localhost ~]$ hadoop fs -ls /user

Found 3 items

drwxr-xr-x   – cloudera cloudera            0 2013-04-16 19:13 /user/cloudera

drwxrwx—   – mapred   mapred              0 2013-04-16 19:13 /user/history

drwxr-xr-x   – hue      supergroup          0 2013-04-16 19:13 /user/hive

[cloudera@localhost ~]$ ls

16april2013.txt  Desktop  Documents  Downloads  Music  Pictures  Public  Templates  typescript  Videos

[cloudera@localhost ~]$ hadoop fs -mkdir /user/cloudera/new_dir

[cloudera@localhost ~]$ hadoop fs -ls /user/cloudera

Found 1 items

drwxr-xr-x   – cloudera cloudera          0 2013-04-17 04:51 /user/cloudera/new_dir

[cloudera@localhost ~]$ hadoop fs -ls /user/cloudera/new_dir

[cloudera@localhost ~]$

Downloaded shakespeare.txt from internet to home and then copy into Hadoop . 

[cloudera@localhost ~]$ hadoop fs -copyFromLocal shakespeare.txt /user/cloudera/new_dir

[cloudera@localhost ~]$ hadoop fs -ls /user/cloudera/new_dir

Found 1 items

-rw-r–r–   1 cloudera cloudera    5590193 2013-04-17 05:08 /user/cloudera/new_dir/shakespeare.txt

[cloudera@localhost ~]$ exit

Cluster Summary
Cluster Summary after copyingshakespeare.txt (File number changed from 52 to 53 nad few other value )

As soon as I started typing local host to see as per video, I saw following localhost URL auto complete .

Hadoop JobTracker :

HBase Master :

Hue : Query Editor

Hadoop YARN


Azure PowerShell 0.6.13

Azure PowerShell 0.6.13 released with substantial change to Support IaaS General Availability

Download Link: PowerShell (Github)

2013.04.16 Version 0.6.13

  • Completely fixed issues with first website creation on a new account. Now you can use PowerShell with a new account directly without the need to go to the Azure portal.
  • Added Get-AzureWebsiteLog -ListPath to get all the available log paths of the website.
  • Fixed the issue of removing custom DNS names in Start/Stop/Restart-AzureWebsite.
  • Fixed several GB18030 encoding issues.
  • AzureRT team
    • BREAKING CHANGE: New-AzureVM and New-AzureQuickVM now require an –AdminUserName parameter when creating Windows based VMs.
    • Added support for virtual machine high memory SKUs (A6 and A7).
    • Remote PowerShell is now enabled by default on Windows based VMs using https. To disable: specify the –DisableWinRMHttps parameter on New-AzureQuickVM or Add-AzureProvisioningConfig. To enable using http: specify –EnableWinRMHttp parameter (Note: http is intended for VM to VM communication and a public endpoint is not created by default).
    • Added Get-AzureWinRMUri new cmdlet to get the connection string URI for Windows Remote Management.
    • Added Set-AzureAvailabilitySet new cmdlet to group similar virtual machines into an availability set after deployment.
    • New-AzureVM and New-AzureQuickVM now support a parameter named –X509Certificates. When a certificate is added to this array it is automatically uploaded and deployed to the virtual machine.
    • Improved *-AzureEndpoint cmdlets:
      • Allows a simple endpoint to be created.
      • Allows a load balanced endpoint to be created.
      • Allows a load balanced endpoint to be created with a health probe and you can now specify the Probe Interval and Timeout periods.
    • Removed subscription check requirement when using Add-AzureVHD with a shared access signature.
    • Added Simultaneous Upgrade option to New-AzureDeployment for Cloud Services deployment. This option can save a significant amount of time during deployments to staging. This option can cause downtime and should only be used in non-production deployments.
    • Upgraded to the latest service management library.
    • Made New-AzureDeployment to use SSL during the deployment.
  • Storage team
    • Renamed Start/Stop-CopyAzureStorageBlob to Start/Stop-AzureStorageBlobCopy. Kept old names as aliases for backward compatibility.

-Abhishek Anand

Windows Azure PowerShell – Storage Commands

In this Blog we will work through few commands to work with Windows Azure Storage.

  •  List of all Windows Azure Subscriptions

Code Snippet:
Write-Host “Selection: Show Me List of Windows Azure Subscriptions” -ForegroundColor Magenta
sleep -seconds 1
Get-AzureSubscription | select SubscriptionName, IsDefault
Write-Host “Subscrption with Isdefault Value – True will be used for operation ” -ForegroundColor Magenta

  •  List of all Windows Azure Storage Account

Code Snippet:
Write-Host “Selection: List of Windows Azure Storage Services” -ForegroundColor Magenta
sleep -seconds 1
Get-AzureStorageAccount |select label, Location

  •  Create New Windows Azure Storage Account in a Specified Location

Code Snippet:
Write-Host “Selection: Create New Windows Azure Storage Account in a Specified Location” -ForegroundColor Magenta
sleep -seconds 1
#New-AzureStorageAccount [-StorageAccountName] <String> [-Description <String>] [-Label <String>] -Location <String> [<CommonParameters>]
[string]$tempStorageAccountName = $(Read-Host -prompt “Please specify new Windows Azure Storage Account Name: “)
[string]$tempDescription = $(Read-Host -prompt “Please specify Description for Windows Azure Storage Account: “)
[string]$tempLabel = $(Read-Host -prompt “Please specify Display Name for Windows Azure Storage Account: “)
Get-AzureLocation | select Name
[string]$tempLocation = $(Read-Host -prompt “Please specify Location for Windows Azure Storage Account (from Above List): “)
New-AzureStorageAccount -StorageAccountName $tempStorageAccountName -Description $tempDescription -Label $tempLabel -Location $tempLocation
Write-Host “Storage account $tempLabel has been created in $tempLocation”

New blob storage cmdlets in Azure PowerShell 0.6.12

Azure PowerShell 0.6.12 released last week and includes new cmdlets to work with blobs in general, not just VHD files. To install, go to azure.com, click Downloads at the top, then click Windows Azure PowerShell under Command Line Tools at the bottom – or just click this link:


2013.03.20 Version 0.6.12
* Windows Azure Storage entity level cmdlets
   * New-AzureStorageContext
   * New-AzureStorageContainer
   * Get-AzureStorageContainer
   * Remove-AzureStorageContainer
   * Get-AzureStorageContainerAcl
   * Set-AzureStorageContainerAcl
   * Get-AzureStorageBlob
   * Get-AzureStorageBlobContent
   * Set-AzureStorageBlobContent
   * Remove-AzureStorageBlob
   * Start-CopyAzureStorageBlob
   * Stop-CopyAzureStorageBlob
   * Get-AzureStorageBlobCopyState

* Windows Azure Web Sites diagnostics log streaming cmdlet
   * Get-AzureWebsitLog –Tail