A LAS class is a representation in R of a las file that aims to
respect as closely as possible the official LAS
specification that describes the file format. “As closely as
possible” means that, due to R internal limitations, it is not possible
to represent a las file exactly as it should be represented.
Additionally, some aspects of the las format specifications are not
supported in lidR. Still, the contents of a LAS
object must
reflect the fact that it is a representation of a standardized format,
so some restrictions are imposed on users.
The function readLAS
reads one or several .las or .laz
file(s) to build a LAS object.
LASfile <- system.file("extdata", "example.laz", package="rlas")
las <- readLAS(LASfile)
print(las)
#> class : LAS (v1.0 format 1)
#> memory : 5.5 Kb
#> extent : 339002.9, 339015.1, 5248000, 5248001 (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / UTM zone 17N
#> area : 9.9 m²
#> points : 30 points
#> density : 3.03 points/m²
#> density : 2.63 pulses/m²
A LAS
object is composed of four slots:
@data
, @header
, @crs
and
@index
.
The slot data
of a LAS object contains a
data.table
with the data read from .las or .laz file(s).
The columns of the table are named after the LAS
specification version 1.4. Each name is reserved and is associated
with a given type:
X
Y
Z
(dbl)Intensity
(int)gpstime
(dbl)ReturnNumber
(int)NumberOfReturns
(int)ScanDirectionFlag
(int)EdgeOfFlightline
(int)Classification
(int)ScannerChannel
(int) (point format >= 6)Synthetic_flag
(bool)Withheld_flag
(bool)Keypoint_flag
(bool)Overlap_flag
(bool) (point format >= 6)ScanAngleRank
(int) (point format < 6)ScanAngle
(dbl) (point format >= 6)UserData
(int)PointSourceID
(int)R
G
B
(int)NIR
(int)Here we can already see some deviations from the official las format
specifications. For example, the attribute ‘Classification’ should be an
unsigned char
stored on 8 bits. However, the R language
does not support this data type and consequently this attribute is
stored in a 32-bit signed int
. One can read the official
las specifications to figure out the other deviations from the original
file format induced by the fact that R only has 32-bit signed integers
and 64-bit signed decimal numbers.
A LAS
object contains a slot @header
that
represents the header of the las file. The header is stored in a
LASheader
object. A LASheader
object contains
two slots: @PHB
for the public header block and
@VLR
for the variable length records. Both slots are lists
labelled according to the las file format specification. See public
documentation of las file format for more information about las
headers. Users should never normally have to worry about the header as
long as they use functions from lidR. Everything is managed internally
to ensure that objects are valid. However, users still need to know that
the contents of the header are important, especially when writing
LAS
objects into las or laz files.
print(header(las))
#> File signature: LASF
#> File source ID: 0
#> Global encoding:
#> - GPS Time Type: GPS Week Time
#> - Synthetic Return Numbers: no
#> - Well Know Text: CRS is GeoTIFF
#> - Aggregate Model: false
#> Project ID - GUID: 00000000-0000-0000-0000-000000000000
#> Version: 1.0
#> System identifier: LAStools (c) by rapidlasso GmbH
#> Generating software: las2las (version 201011)
#> File creation d/y: 343/2011
#> header size: 227
#> Offset to point data: 323
#> Num. var. length record: 1
#> Point data format: 1
#> Point data record length: 28
#> Num. of point records: 30
#> Num. of points by return: 26 4 0 0 0
#> Scale factor X Y Z: 0.001 0.001 0.001
#> Offset X Y Z: 6e+05 6500000 0
#> min X Y Z: 339002.9 5248000 973.145
#> max X Y Z: 339015.1 5248001 978.345
#> Variable Length Records (VLR):
#> Variable Length Record 1 of 1
#> Description: by LAStools of rapidlasso GmbH
#> Tags:
#> Key 1024 value 1
#> Key 3072 value 26917
#> Key 3076 value 9001
#> Key 4099 value 9001
#> Extended Variable Length Records (EVLR): void
The slot @crs
contains a crs
object from
package sf
and stores the coordinate reference system (CRS)
of the las file. In the official las specifications the CRS is stored in
the header. In a LAS object the CRS is stored in the header using the
EPSG code of the CRS or a WKT string (depending how it has been recorded
in the file), but it is also stored in the slot @crs
. This
is to ensure it meets R standards and is in accordance with other
spatial data packages in the R ecosystem. Consequently, to get a valid
LAS object properly written into a las file it is important to set the
CRS using the function st_crs()
. This function updates the
header of the LAS object and the crs
According to LAS specification the CRS system can also be a WKT
string when the WKT bit flag is set to TRUE
.
Records what kind of spatial index will be used internally to perform
computations that require spatial indexing. See
help("lidR-spatial-index")
.
R users who are used to manipulating spatial data are likely to be
very familiar with the sp
or sf
packages and
all the classes used to store spatial data, such as
SpatialPointsDataFrame
,
SpatialPolygonsDataFrame
, sf
and so on. The
data contained in these classes are freely modifiable by the user
because they can be of any type. A LAS
object is not freely
modifiable because it is a strongly standardized representation of a las
file.
For example, users cannot replace the Classification
attribute with the value 0
because 0
is a
decimal number in R and the ‘Classification’ attribute is an integer.
The following throws an error:
las$Classification <- 0
#> Error: Trying to replace data of type integer by data of type double: this action is not allowed
In R 0L
is an integer and thus the following is
allowed:
It would be possible to automatically cast the input into the correct type without throwing an error. But for the lidR package we chose to be very pedantic on this point to avoid any potential problems and because we would prefer users to be careful about the content of their data.
The addition of a new column is also restricted. For example, one may
want to add an attribute R
corresponding to the red
channel.
las$R <- 0
#> Error: Addition of a new column using [[ is forbidden for LAS objects. See ?add_attribute
This is not allowed because a LAS object should always be valid. By allowing the user to add an R column the LAS object would no longer be valid for two reasons:
R
is a reserved name of the core attributes and must be
an integer. In the example above it is a decimal number.In consequence, adding a column must be done via the functions
add_attribute
, add_lasattribute
,
add_lasrgb
. This way users are forced to read the
documentation of these two functions. And yet some restrictions are
still in place. For example, the following is not allowed for the same
reasons as above:
las <- add_attribute(las, 0, "R")
#> Error: R is part of the core attributes and is a forbidden name.
But anyway, R being R, there is no way to completely restrict editing of objects. Users can always by-pass the restrictions to make LAS objects that are not strictly valid:
In conclusion, a LAS object is not actually immutable but at least there are some restrictions to ensure that the user is aware that not everything is authorized.
As we have seen, a LAS object contains a core of attributes associated with reserved names in accordance with the las specifications. It is possible, however, to add more attributes to a LAS object even if they are not part of the core attributes imposed by the las specifications.
Extra attributes are just like adding a column in a regular table in
R. One can freely modify the data using the function
add_attribute
. It is thus possible to add an attribute to a
LAS object. For example, it is possible to attribute an ID to each point
and use this value in subsequent code:
But it is important to understand that this attribute is invalid with respect to the las specifications. Thus it can be used at the R level but will not be written in a las file and thus will be lost at write time. Depending on the purpose of this attribute it may or may not be useful to be able to write this extra data. Most of the time the information is only useful at the R level but sometimes it might be appropriate to store the data in a file.
The las specifications allow for storing extra attributes that are
not part of the core attributes. The way to do this is more complex.
Basically it is called extra bytes attributes and it implies
modification of the LAS object header to indicate that the contents of
the file contains more than the core attributes. This is abstracted with
the function add_lasattribute
.
Using this function, the header is updated according to the las specification and thus the extra bytes attributes can be written in the file. lidR supports up to 10 extra bytes attributes. The extra bytes attributes are limited to being of type numeric. Indeed, the las specifications do not allow for storing extra bytes attributes of type string or type boolean. Thus the following fails:
It is common that users report bugs arising from the fact that a
point cloud is invalid. This is why we introduced the function
las_check
to perform a deep inspection of LAS objects. This
function checks if a LAS object is in accordance with the las
specifications but also it checks for weird point clouds that could be
valid with respect to the specifications but invalid for actual
processing. For example, it often happens that a las file contains
duplicated points for no valid reason. This may lead to trees being
detected twice, to invalid metrics, or to errors in DTM generation, and
so on…
las_check(las)
#>
#> Checking the data
#> - Checking coordinates...[0;32m ✓[0m
#> - Checking coordinates type...[0;32m ✓[0m
#> - Checking coordinates range...[0;32m ✓[0m
#> - Checking coordinates quantization...[0;32m ✓[0m
#> - Checking attributes type...[0;32m ✓[0m
#> - Checking ReturnNumber validity...[0;32m ✓[0m
#> - Checking NumberOfReturns validity...[0;32m ✓[0m
#> - Checking ReturnNumber vs. NumberOfReturns...[0;32m ✓[0m
#> - Checking RGB validity...[0;32m ✓[0m
#> - Checking absence of NAs...[0;32m ✓[0m
#> - Checking duplicated points...[0;32m ✓[0m
#> - Checking degenerated ground points...[0;37m skipped[0m
#> - Checking attribute population...[0;32m ✓[0m
#> - Checking gpstime incoherances[0;32m ✓[0m
#> - Checking flag attributes...[0;32m ✓[0m
#> - Checking user data attribute...
#> [0;32m 🛈 30 points have a non 0 UserData attribute. This probably has a meaning[0m
#> Checking the header
#> - Checking header completeness...[0;32m ✓[0m
#> - Checking scale factor validity...[0;32m ✓[0m
#> - Checking point data format ID validity...[0;32m ✓[0m
#> - Checking extra bytes attributes validity...[0;32m ✓[0m
#> - Checking the bounding box validity...[0;32m ✓[0m
#> - Checking coordinate reference system...[0;32m ✓[0m
#> Checking header vs data adequacy
#> - Checking attributes vs. point format...[0;32m ✓[0m
#> - Checking header bbox vs. actual content...[0;32m ✓[0m
#> - Checking header number of points vs. actual content...[0;32m ✓[0m
#> - Checking header return number vs. actual content...[0;32m ✓[0m
#> Checking coordinate reference system...
#> - Checking if the CRS was understood by R...[0;32m ✓[0m
#> Checking preprocessing already done
#> - Checking ground classification...[0;31m no[0m
#> - Checking normalization...[0;31m no[0m
#> - Checking negative outliers...[0;32m ✓[0m
#> - Checking flightline classification...[0;32m yes[0m
#> Checking compression
#> - Checking attribute compression...[0;31m no[0m
lidR provides a simple plot
function to plot a LAS
object in 3D. It is based on the rgl
package. The
rgl
package is amazing but has some problems working with
large point clouds. We are currently developing our own viewer to
overcome this issue. This viewer is fully compatible with
lidR
but still in heavy development.
The parameter color
expects the name of the attribute
you want to use to colorize the points. Default is Z
If your file contains RGB data the string "RGB"
is
supported:
This section is of major importance because there are many instances where R is weak at memory management.
Firstly, it is important to note that R only enables manipulation of 32-bit integers and 64-bit decimal numbers. But the las specification states, for example, that the intensity is stored on 16 bits (see previous sections). When read in R it must be converted to 32 bits and therefore will use twice as much memory than is needed. Worse, the return numbers are stored on 3 bits in las files but 32 bits in R, therefore using 11 times more memory than is required. Last but not least, flags are stored on 1 bit, whereas R uses 32 bits. This is 32 times more memory than is needed. As a consequence, a LAS object is 2 to 3 times larger than it needs to be.
Secondly, the way the point cloud is stored and the way R works implies that copies will be made of the point cloud either in the user’s workspace or internally. Considering that point clouds can be huge it is important to be aware of this point.
Let’s assume we have loaded a large las file that uses 1 GB of R memory.
Suppose we now want to remove a few outliers above 50 m. One can write the following:
And the user now has two objects:
las.original
of size 1 GBlas.denoised
that is also 1 GB, because we only removed
a dozen or so points out of millions.This uses 2 GB of memory. This is how R works. When a vector is subsetted it is necessarily copied. We talk about deep copies. In regular data processing it rarely matters and this behavior is barely noticeable. Indeed, it is rare that data uses a lot of memory. But LiDAR datasets are often massive, and this necessitates that users must carefully consider memory usage to avoid running out of RAM.
In the previous example we showed a deep copy. A deep copy means that the point cloud is actually copied into the memory. A deep copy occurs when the number of points of the output is different from the number of points of the input. But many functions return the same number of point as the input. In such cases only shallow copies are made. For example, when classifying points into ground and non-ground:
In this case the vectors that store the X Y Z coordinates as well as
those that store the Intensity, ReturnNumber, NumberOfReturn and other
attributes were not modified by the function. Only the contents of the
‘Classification’ attribute were modified. In this case
las.classified
and las.original
, even though
they are two different objects, share the same memory for X Y Z, and so
on, but the attributes ‘Classification’ are different. In
conclusion:
las.original
is of size 1 GBlas.classified
is also 1 GB.But both together they are not equal to 2 GB, but ~1.1 GB because they share the same memory. The content of the original LAS object was shallow copied. An understanding of the concepts of deep and shallow copies is important for optimizing your scripts.
As we have seen, because of the way R is designed, lidR uses a large
amount of memory anyway. To deal with this limitation
readLAS
has two optimizations: the parameter
select
and the parameter filter
.
select
To save memory only useful data can be loaded. readLAS
can take an optional parameter select
which enables the
user to selectively load the data of interest. For example, one can load
only the X Y Z
fields. This selection is done at the C++
level while reading and is memory-optimized.
filter
While select
enables the user to select “columns” (or
attributes) while reading files, filter
allows selection of
“rows” (or points) while reading. Again, the selection is done at the
C++ level and is memory-optimized so not a single bit is lost at the R
level. Removing data at reading time that is superfluous for your
purposes saves memory and decreases computation time.