Thursday, May 30, 2024

Lab 4: Geoprocessing

This lab was less about Python and more about enhancing the capabilities of ArcGIS Pro with Python. The lab focused on developing scripts using the Model builder and Notebook.

Many low-code solutions like Model builder for me tend to be a let down. Often they generate verbose code which does not scale very well. Model build does have some nuances but for the give intention it is a great tool. When preforming an analysis the process must be reproducible and the Model builder helps accomplish this task. 

Some consider Model builder an automation tool yet it requires ArcGIS Pro and the analysis to initiate the model. So to say that it provides automation is subjective or limited in scope. Automation would be a reactive information system that ingested, responded, or performed some action based off of an event. This could be with or without human involvement beyond starting and managing the process. The is that it is reactive in nature.

The Notebook provide in ArcGIS Pro is like any other Jupyter notebook with cells for markdown or python code. Notebook are a great way to collaborate and exchange information but make for a terrible integrated development environment. This goes for Jupyter's native notebooks too. Its that they lacks most of the tooling or responsiveness that other integrated development environments (IDE) provide.

Before venturing into my insight on development, style, and tooling, this is a quick screenshot of my python running in ArcGIS Pro's notebook.

Output for Lab 4 Geoprocessing Script in ArcGIS Notebook

IDLE

The built-in IDE is great for quick one-off scripts or light weight debugging. Beyond that, say for an application of module, it just a cogitative nightmare navigating between files, functions, and classes. This tool is my go to only in a pinch but only for a quick fix.

Jupyter Notebooks

I use these a lot in my daily job from time to time. I prefer the web-based interface over the IDLE for more robust things but in general when used as a mode to communicate something about data in a reproducible format, these notebooks rock. Yet, I rarely develop a notebook using the native interface. 

VS Code

What I'm about to say is going to probably hurt some feelings but VS Code is an utter nightmare to setup. When you need to use it for multiple languages that nightmare becomes a full feature horror movie. When using VS Code, I think I spend more time installing and configuring extension then doing work. Yet once things are setup its latency for intellisense and code completion is less than IDLE and Jupyter. 

PyCharm or IDEA

I make it no secret that I'm a JetBrains fan-boy. I'm a huge open source fanatic but one who is willing to pay for software which works and works well and JetBrains products fall into this category. Their Python specific IDE is named PyCharm and is a charm to work with. The community version provides almost all the tooling any python developer could want but the professional addition comes with those extras like remote development, HTTP Client, Jupyter integration, SSH, Database Tools, and framework support for Django, Flask, and FastAPI. This an the other JetBrains products are worth every penny!

 Coding Style

Code should follow rules like any other well-formatted document. Yes, code is like a personal or business letters, academic paper, poems, or any other document intended to be read by people. Another reason I dislike many low-code solutions, the generated code tends to be a giant soup of code and difficult to read and understand for mere mortals. Like a document can be a mine field to which format to follow, lucky there are well established standards which you may adopt. 

For Python there is PEP 8 - Style Guide for Python, this guide is great but leaves much to the author to decided. It also reference other Python Enhancement Proposals (PEP) for things like DocStrings which come in a number of flavors. The gist is stay consistent. 

For PHP there are PSR-12: Extended Coding Style unlike PEP-8 this document is far more strict and is utilized by many PHP projects at the very least a foundation to ensure code remains consistent.

The Java language typically utilizes two styling guides but they have very subtle differences. The original dating back to 1999 is the Code Convention for Java Programming Language and the more bit newer document Google Java Style Guide.

Google has a number of style guides which may be found at https://google.github.io/styleguide/.

Why This Bit About Style?

A common theme in Python are exceptionally long list of class, function, or method arguments. Most coding practices discourages such a practice but that seems to be lost on python. Often when given three or more arguments I will utilize a dictionary and its ability to deconstruct  or unpack. What do I mean, nothing is better than a code example. 

In the lab we are asked to perform three geoprocessing actions, add XY coordinates to an existing feature class, perform a buffer analysis, and finally dissolve the output from the buffer into a single feature. Of these three, the buffer analysis requires at least three arguments, four if including the dissolve processing step. This process alone may take up to 8 arguments depending on the desired results.

Consider these two code snippet examples, which do you prefer? Hint: there is no wrong answer!

# Set the base input arguments
feature = 'hospitals'
buffer_distance = 1000
buffer_unit = 'Meter'

# derived the buffer analysis arguments
buffer_args = {
    'in_features': feature,
    'out_feature_class': f'{feature}_buffer',
    'buffer_distance_or_field': f'{buffer_distance} {buffer_unit}'
}

arcpy.analysis.Buffer(**buffer_args)
print(f'\nExecuting buffer anlaysis on {feature}...')
print(arcpy.GetMessages())

Notice the double **buffer_args, this notation will deconstruct the dictionary and use them as arugments for the arcpy.analysis.Buffer. There is just one rule, the key value for the dictionary must match the variable used in the member function call. So this is effectively the same as:

arcpy.analysis.Buffer(in_features='hospitals', out_feature_class=f'{feature}_buffer',
                                 buffer_distance_or_field=f'{buffer_distance} {buffer_unit}')

Regardless of style, it is important to avoid to repeating string literals. Using the string 'hospitals' over and over vs the variable feature prevents buggy code from typos and allows me to easily configure the scripts target feature class.

Happy Coding!

No comments: